Google
This is a digital copy of a book that was preserved for generations on Hbrary shelves before it was carefully scanned by Google as part of a project
to make the world's books discoverable online.
It has survived long enough for the copyright to expire and the book to enter the public domain. A public domain book is one that was never subject
to copyright or whose legal copyright term has expired. Whether a book is in the public domain may vary country to country. Public domain books
are our gateways to the past, representing a wealth of history, culture and knowledge that's often difficult to discover.
Marks, notations and other maiginalia present in the original volume will appear in this file - a reminder of this book's long journey from the
publisher to a library and finally to you.
Usage guidelines
Google is proud to partner with libraries to digitize public domain materials and make them widely accessible. Public domain books belong to the
public and we are merely their custodians. Nevertheless, this work is expensive, so in order to keep providing this resource, we liave taken steps to
prevent abuse by commercial parties, including placing technical restrictions on automated querying.
We also ask that you:
+ Make non-commercial use of the files We designed Google Book Search for use by individuals, and we request that you use these files for
personal, non-commercial purposes.
+ Refrain fivm automated querying Do not send automated queries of any sort to Google's system: If you are conducting research on machine
translation, optical character recognition or other areas where access to a large amount of text is helpful, please contact us. We encourage the
use of public domain materials for these purposes and may be able to help.
+ Maintain attributionTht GoogXt "watermark" you see on each file is essential for informing people about this project and helping them find
additional materials through Google Book Search. Please do not remove it.
+ Keep it legal Whatever your use, remember that you are responsible for ensuring that what you are doing is legal. Do not assume that just
because we believe a book is in the public domain for users in the United States, that the work is also in the public domain for users in other
countries. Whether a book is still in copyright varies from country to country, and we can't offer guidance on whether any specific use of
any specific book is allowed. Please do not assume that a book's appearance in Google Book Search means it can be used in any manner
anywhere in the world. Copyright infringement liabili^ can be quite severe.
About Google Book Search
Google's mission is to organize the world's information and to make it universally accessible and useful. Google Book Search helps readers
discover the world's books while helping authors and publishers reach new audiences. You can search through the full text of this book on the web
at |http : //books . google . com/|
SMFOKD-VMVEiSITY- UBRABf
nun siniiiiiEiBJi 1 1 1^1
SCHOOL OF EDUCATION
LIBRARY
JlJBBtiilii jiiiri
SCHOOL OF EDUCATION
LIBRARY
Non-Verbal Intelligence Tests
for Use in China
By
Herman Chan-En Liu, Ph.D.
Teachers College, C^umbia University
Contributions to Education, No. 126
Published by
Vraiftern CoIUgt, Colnmbfii VnibrTiitp
New York City
1922
Ccpyrii*t 1912, iy HeHMAK ChAN-En LiU
c
to my friends
Emiue Bretthaubr
Jambs H. Franklin
Andrew MacLeish
ACKNOWLEDGMENTS
Grateful thanks and appreciation are hereby expressed to Pro-
fessor Edward Lee Thomdike, of Teachers College, Columbia Uni-
versity, under whose almost daily g;uidance and inspiration this
study has been carried out; to Miss Mai^aret P. Rae, principal of
New York Public School No. 1d8, for her ready cooperation and
assistance in making the experiment successful; and to Professor
William Anderson McCall, Professor Henry Alford Ruger, Miss
Ella Woodyard.of Teachers College, and a host of others for much
valuable aid.
Herman Chan-En Liu
CONTENTS
CHAPTER PAGE
I. Introduction i
A. The Problem i
B. Intelligence Examination in China i
C. The Development of Non-Verbal Tests in
America 6
II. The Experiment 12
A. The Preliminary Plan 12
B. Tests Used in the Experiment 18
C. Method of Procedure 21
III, Formation of a Criterion 28
A. Elements of a Criterion '28
B. Test Scores Weighting 33
C. Method of Selection of the Final Criterion , . 39
IV. Selection of Test Elements 43
A. Selection of Test Elements by Correlation
Method 43
B. Selection of Tests by Rating 49
C. Selection of Tests by Partial Correlation ... 50
D. Selection of Tests by Compxjsite Method ... 52
E. Weighting by Regression Equation 55
V. Re-testing 58
A. Procedure of Re-testing 58
B. Statistical Study 61
VI. Alternative Forms and Standardization 63
A. Alternative Forms 63
B. Standardization 64
1. Norms 65
2. Scaling 66
viii ConletUs
CHAPTER FAGS
VII. The Chimesb Non-Verbal Tests 68
A. The Nature of the Tests 68
B. Instructions to Examiner 69
C. Directions for Giving the Tests 70
D. Directions for Scoring the Tests 73
E. Treatment of Results 74
F. Caution 74
VIII. Summary AND Conclusions 75
Appendices
A. Samples of Form A of the Chinese Non-Verbal Intelli-
gence Examination 77
.B. Samples of Records Kept 78
Bibliography 83
INDEX OF TABLES
NUMBER PAGE
I. Distribution of Age and the Numerical Values As«gned . . 30
II. Age Distribution Showing the Slope and the Increase of Scores
as Age Advances (Data from Boys Who Have Taken the Pint-
ner Non-Language Tests) 35
II!. Grade Distribution of Pressey Scale 36
IV. Weighting of the Scales According to Q . ~ 38
V. Data for School Criterion (10 Selected Pupils) 40
VI. Data for Calculation of the Final Criterion (10 Selected Pupils) 43
VII. Correlations of Individual Tests with Final Criterion by Shep-
pard's and Product- Moment Methods 45
VIII. Correlations of Individual Tests with Combination of Beta 4
and 6, by Sheppard's Formula . . ■. 46
IX. Correlations of the Different Scales with the Final Criterion
and the Inter-Correlations of the Individual Tests .... 47
X. Correlations between the Individual Tests and the Basic Tests
(Pintnera and 3, and Beta6) 49
XI. Ratings of the Indi\ndual Tests by Competent Judges ... 51
XII, Individual Tests Rated w Application to Chinese 52
XIII. Correlations of the Individual Tests with the Final Criterion
with the Elements of the Basic Tests Eliminated ("2-3) ■ ■ 53
XIV. Comtrined Value of the Individual Tests, as Determined by
Ratings and Partial r Method ; 54
XV. Data for Calculation of Regression Equation 56
XVI. Distribution of Re-testing Scores by Grades 59
XVI!, Distribution of Re-testing Scores by ^es 60
INDEX OF FIGURES
imUBBR
I. The Nine-Ring Puzzle
3. The Seven Mysterious Boards
3. Illustrations of the Seven Mysterious Boards
4. Illustrations of the Seven Mysterious Boards
5. Showing 27 Per Cent Overlapping of Grade III over Grade IV ii
the Scores of Pressey Primer Scale
6. Showing 21 Per Cent Overlapping of Grade 111 over Grade IV ii
the Scores of Myers Mental Measure
7. Showing 15.2 Per Cent Overlapping of Grade III over Grade IV ii
the Scores of Pintner Non-Language Tests
8. Showing Iv Per Cent Overlapping of Grade 111 over Grade IV in thi
Scores of Army Beta Examination
9. Showing 9.8 Per Cent Overlapping of Grade 111 over Grade IV ii
the Scores of I>earbom Group Tests of Intelligence
CHAPTER I ■•..•/.,
INTRODUCTION •'.'-•' .-
A. THE PROBLEM '■'.•'':.
Psychological tests which have been applied in America wifli,.
great success are now being experimented with in China. Progres- .
sive Chinese educators who are attempting to introduce the meas-
urement movement into China, however, are confronted with the
problem of procuring and selecting suitable test material. China,
with its distinctive civilization and numerous dialects, presents a
difficult field for the literal transcription of the American intelligence
tests. This difficulty virtually prevents a widespread use in China
of the language test, and makes necessary the construction of a non-
language test. The present study is an attempt to develop a non-
verbal scale, which, because of the elimination of language and
schooling factors, may be used in China as an independent measure
of general intelligence or as a supplement to a language test.
B. INTELLIGENCE EXAMINATIONS IN CHINA
The practice of setting intelligence examinations is not new in
China. It is as old as our history, although the traditional methods
have been crude and pseudo- psychological.
The earliest methods, which still prevail, are Kan Hsiang, phys-
iognomy-reading, and Shan Ming, fortune-telling. Pseudo-psy-
chologists in the guise of fortune-tellers and popular physiognomists
are found everywhere. They are frequently consulted by unedu-
cated parents as to the intelligence of their children, whose careers
and destinies they foretell. The calculations of these pseudo-psy-
chologists are said to be based upon the hour and date of birth, and
physiognomic and anthropometric characteristics.
The system of competitive examinations, employed in China for
centuries, was a sort of intelligence test. Its purpose was the selec-
tion of candidates for civil service. Scholars gathered at the exam-
ination halls, which were located in every district. There they were
2 Non- Verbal Ivieitigence Tests for Use in China
confined in little cpH$ In which they comfxised classical essays on
assigned subjectr.':.S^d^linations were conducted and papers graded
by high govpijifnerit ofHcials. The results were announced with
great cerein&^rand the successful candidates honored with "Kung
Ming."— ^tfi.e equivalent of American academic degrees.
The'practice was founded on the theory that only the intelligent
,ailij.|feducated men should rule. No age or birth qualifications were
■j^^^ired for participation in these examinations. Youngsters under
'•..jtWelve years of age, however, were sometimes released from the
• rigid, formal standards. In such cases the regular examination was
often replaced by a series of "opposites or matching tests," in which
the applicants were required to match assigned words and phrases.
For instance, "East" would be expected to be matched with the
word "West"; "above" with "below"; "mountains" with "oceans."
The following is a typical "Dui Dzi," or opposites test: '
(o) Chiang{;l'i } Fu ■/<' Djoh [''^- Ma .-
(6) Wang {'(' . Dzi i-- Cheng'-' Lung ; •
The translation of matching phrase (a) with phrase {b) is as
follows: *
(o) Consider Father Being Horse
(A) Expect Son Becoming Drj^on
Of the old intelligence tests used in the schools of China, there
were certain kinds called "Tien Dzih," that is, "completion tests."
Some teachers occasionally employed these tests in judging the
brightness of their pupils; others employed them as supplementary
' A Btory relates that a certain farmer carried his young son on his back to the
examination hall. The examiner, upon the arrival of the youngster, was surprised at
his presence and inquired of him how he had managed to come all the way from his
distant home. The boy replied: "I came on my father's back." The boy's answer
at once suigested to tlie examiner a topic for the opposites test, so he said: "Well.
If you can match the phrase which I am about to give you. you are passed." The
examiner then requested the boy to match "Consider Father Being Horse." The
clever child, without a moment's heaitation. replied: "Expect Son Becoming Dragon."
He had matched the assigned phrase so well, that lie was given a pass without further
examination.
* These are not strictly "opposites tests," as understood in America; but rather
matching tests. They are comprehensive, requiring on the part of the e
quick understanding and sound reasoning.
Introduction 3
methods of teaching elementary composition. Problems in compo-
sition were often made by omitting a few words from a well-con-
structed sentence, necessitating the filling in of the blanks by the
children.
A type of test similar to the puzzles used by Ruger is also quite
common in China. The most famous of these puzzles is the "Kiu
Lien Huan" — a nine-ring puzzle (see Fig. i) , consisting of nine con-
nected copper rings mounted on a bar with a rod running through the
center of the rings. The puzzle is how to get the rod out of the rings
The Nine-Ring Puwie.
— a task which requires reasoning, and which seldom is solved by
the trial-and-error method. The ring puzzle is used merely as a
toy, not as a formal test, yet one often hears the remark, "Solve
this puzzle and let us see how bright you are."
"Performance tests" also have been in use for centuries in China.
The most noted one is "YihChihTu," also called "Tsih ChiaoPan"
(see Figs. 2, 3 and 4). Translated literally into English, it would
be called "Increasing Wisdom Board," or the "Seven Mysterious
Boards." It was called the "Increasing Wisdom Board" because
playing with it was believed to increase one's wisdom. It was called
the "Seven Mysterious Boards" because with the seven pieces of
different shapes and sizes which made up the game, many forms of
men, animals, birds, and inanimate objects could be constructed.
The game was played by any number of persons, each with his own
set of forms. The purpose was to see which person could construct
the largest number of objects out of his seven pieces, the winner
being considered the most intelligent person in the group.*
> It 1* raid tbat th« same originated In the ancient imperial palace among the
women of tlie court, who, in the ireat amount of leisure time at thrit disposal, wel-
iximed such iporl. Later it became popular amani the people.
4 Non- Verbal InleUigence Tests for Use in China
F^fr. i
F.o. 3
Fig. 3. The Seven Mysterious Boards.
Fig. 3. lUustratiom □( the Seven Mysterious Boards:
I Man walldng; a carriage; 3 man running; 4and5 ti
Fig. 4. Illustrations of the Seven Mynerious Boards:
Candk sticks and different kinds of \-e$Eels.
Introduction 5
The various tests which have been described here cannot be
termed "intelligence tests" in the strictly psychological sense be-
cause they are not standardized . They are not extensively used as a
measure of general intelligence, but rather as intellectual games.
They do demonstrate, however, that the practice of intelligence
examinations, although crude and pseudo-psycholc^ical , does exist
and has existed in China for centuries. It is quite possible that
some of these old methods and materials may prove useful in the
construction of a genuine intelligence test for China.
It is only within the last few years that scientific psychological
measurements have been inifoduced in China. The earliest known
experimental work on the subject is that conducted by Dr.W .W.
Creighton.* From 1915 to 1917, under the direction of Professor
W. H. Pyle, of the University of Missouri, Dr, Creighton made a
study of the mental and physical characteristics of Cantonese chil-
dren. The subjects under examination numbered approximately
five hundred, most of them ranging from ten to eighteen years in
age, although twenty-five women were among those examined. The
mental tests used in this experiment were those of rote memory,
logical memory, substitution, analogies, and dot patterns. In con-
ducting this experiment Dr. Creighton met with great difficulties
as a result of the many dialects prevailing in this province. In his
report he says: "In the mental measurements we were confronted
at once by language diflliculties."
In 1918 Professor G. D. Walcott' measured the intelligence of
the students in the senior class in Tsinghua College, Peking, who
averaged twenty-two years of age. Professor Walcott used the
Stanford Revision of the Binet Scale, with the Scott Group Test as
a check. The results of the experiment were not very satisfactory
as, in addition to theinsufficiency of the scale for persons of that age,
the language difficulties were insurmountable.
Somewhat later, in the fall of 1920, the Nanking Government
Teachers College tried psychological tests for the entrance examina-
tion. This is the first attempt made by Chinese educators to intro-
I Pyle. W. H.: "A Study ol the Mental and Physical Characteriatica of the Chinese,"
School and Sociily, Vol. vm, No. 19a (August 31, 191S), pp. »64-69.
' Walcott. G. D. "The Intelligence of Chinese Students." School and Society, xi,
1930, pp. 474-80.
6 Non- Verbal Intelligence Tests for Use in China
duce scientific intelligence tests into China. Two psychologists
educated in America, Professors H. C. Chen and S. C. Liao,'
devised hve tests. The correlation of these tests and the averse
grades of the regular examination was .39.'
Psychological tests for entrance examinations were next taken
up by the Peking Government Teachers College. The correlatioo
between the tests and the average grades of the regular examination
was practically zero {.000046).*
At the present time, Chinese pn^ressive educators, especially
those trained in America, are eager to introduce the use of psycho-
logical tests into China. Institutions, such as Nanking and Pekii^
Teachers Colleges, as indicated above, have already started the
movement. A few private and missionary schools have also adopted
some form of tests. The Stanford-Binet Scale has been translated,
although it is little used. Aside from these isolated experiments,
however, very little has been done. Psychological tests remain
virtually unknown. Here lies a great unexplored held of endeavor
for the young Chinese schoolman trained in modem scientific
method. He needs to understand, however, the difEculties which
the use of the numerous dialects and Ohe large percentage of illiter-
acy offer to the use of any language scale. Evidently a non-verbal
test may hope to succeed where the language test is totally inade-
quate. The development of such a non-language scale is the pur-
pose of the present study.
C. THB DBVBLOPMBNT OT NON-VBRBAL TESTS IN AMERICA
Psychological tests may be roughly classified into two main
groups: namely, language tests and non-language tests. The for-
mer includes those tests which require verbal response from the sub-
ject. The latter group of tests does not require such verbal response.
The non-language tests, again, may be subdivided into a group of
performance tests which require the doing of some task by means of
' Journal of EdtiealioHoi Rtitarch. Vol. iii. No. s (May, Iflli), p. 394.
' Ai this goes to presB, the author has received a copy of Mental Ttstt i» CkitM.
written by Profewora H. C. Chen and C. S. Liao. It containi thiny-five diSerent
tetti. twenty-four of which are [lanalated from American testa.
■ The data are found in the Ptking Teachert CotUft Wetkly, No. 133 (September
II, i93i>.p.3, but the correlation was computed by the author by the product-moncnt
Introduction 7
certain actual mechanical manipulations, and a group which require
the subject to work with geometrical designs, figures or pictures,
indicating the results of his thinking by making lines or pictures.
Non-language or non-verbal intelligence tests are the outgrowth
of the intelligence measurement movement. Of recent development
and used extensively only within the last two or three years, these
non-verbal tests have shared the fame of the language tests. Among
the tests devised by Alfred BJnet, father of the movement for the
measurement of intelligence, and published in his 1905-06 series are
a number of tests which do not require verbal responses.* For
example, in the visual codrdinations test, the examiner moves a
lighted match slowly before the subject's eyes and notes whether he
follows the movement with the properly coflrdinated movements of
the head and eyes. In the test known as "prehension provoked
tactually," he places the small wooden cube in contact with the
palm or the back of the subject's hand to determine whether he can
execute properly cotirdinated movements of grasping. In the draw-
ing test, he shows the subject two drawings, permits him to look at
them for ten seconds, and then requires him to draw the views from
memory. None of these tests expects a verbal resfxinse from the
subject.
The scale devised by these French psychologists, Alfred Binet
and T. Simon, was first translated and adapted for American use
by Goddard .* Kuhlman * and WalHn * followed with further adap-
tations. The latest revisions of the scale are by Yerkes, Bridges,
and Hardwick,' by Terman • and by Herring.' They all adhere
' Th« oUier non-language tests In thr Binet-Slmon 1905 Bcries are tests numbered
3, 4, 5, 10, 13. II. II. 13. and 19. In the 1908 series the non-language testa are num-
bered g, 10, It, 13, 14, t6, 33, 34. 33. 54. For the complete account se« A. Binet
and T. Simon, "L« d^vetoppement de i'intelligence cbez leu enfanta." in L'Annii
Psychotogigiu, 14. igoS, pp. 1-94; and A. Binet and T, Simon. "L 'intelligence des imbe-
ciles," in L'AHnt4 piychotogiqut, i^og, pp. 1-147.
* Goddard, H. H.: "The Binet-Slmoa Measuring Scale of Intelligence. Revised."
Training School BuUttin. Vol. viii (1911). pp. 56-63.
' Kuhlman, F.: "A Revision of the Binet-Simon System for Measuring the Intel-
ligence o( ChUdna," Journal of Psycho-Aslhtnia, monograph supnlement. No. i. p.41.
' Wallin, J. E.; Experimtnlal Studies of Mmtal Deftclivts: A CriUuHr of the Bintt-
Simon Ttsls.
'Verkei. R. M.. Bridges, J. W. and Haidwick, P. S.: A PoinI Scalt for Mtasurint
Mental AKlUy.
' Terman. Lewis M.: The iitaiitremetil of Inltlliitnu.
' Herring, John P.: Significanct of Certain Elemenli in InlelUgence Examinations,
'Unpublished Ph.D. dissertation (Columbia University).
8 Nan-Verbal Intelligence Tests for Use in China
more or less closely to the original Binet Scale, and consequently
some of their tests are non-verbal in nature.
In spite of the merits of the Binet-Simon Scale and its revisions,
their chief deficiency lies in the large proportion of tasks requiring
language responses. This criticism of the scale was vigorously pre-
sented by Ayres in 1911. He pointed out that the Binet tests pre-
dominantly reflect the child's ability fluently to use words, and do
not reveal his ability to do acts. Thus, it gives "a warped and par-
tial measure of his real degree of intelligence." *
, The language difficulty, inherent in the Binet-Simon Scale and its
various revisions, became evident when the clinical psychologists
attempted to apply it in various fields of practical work. They
found that the Scale was utterly inadequate for the mental examina-
tion of non-English speaking people, speech defectives, the deaf, and
those with language difficulties. Hence they introduced non-lan-
guage tests which do not require language responses on the part of
the child for adequate performance. Among those who first used
the non-language test were Healy and Femald.' In carrying out
mental examinations at the Juvenile Psychopathic Institute of
Chicago, they had been confronted with the problem of testing a
cosmopolitan population. Some of the inmates were illiterate, and
some, though educated in their own tongue, were unable to speak
the English language. Since they represented most of the nation-
alities and languages of Europe no single test requiring language
directions and responses could be adequate to measure them. In
discussing their work, Healy and Fernald say: "The Binet-Simon
Scale helps little where the language factor is a barrier, either on
account of foreign parentage or insufficient schooling, or with unedu-
cated deaf and dumb children."* They became convinced that
language, as far as possible, should be eliminated from the mental
examinations given to such subjects. They say: "In predicting the
possible development of an individual under various conditions, it is
most desirable to ascertain the mental ability quite apart from the
individual's experience in formal training in our language, or indeed
' Ayres. L. P.: "The Binet-Simon Measuring Scale for Intelligence: Some Crltlciami
and Suggestions," Psychological Clinic. Vol, v (igii). pp. 187-96.
'Healy, W., and Fernald. G. M,: "Tests tor Practical Mental Classificationa,"
Psychological Monographs. Vol. 54. No. a. pp. 4-5.
' Ibid.
Introduction 9
any language. It often becomes necessary to classify mentally a
subject who has had no education in English-speaking schools, or
indeed who has had but little schooling of any kind." '
The work carried on at the Institute not only proved the inade-
quacy of the language tests, but demonstrated the practical value
of the non-language tests. Healy and Femald conclude as follows:
"On one occasion we found ourselves able to demonstrate satisfac-
torily that a Gypsy boy of fifteen, quite innocent of schooling and
knowledge of the three R's, had at least fair, if not good, native
ability. And repeatedly a number of our tests have proved most
serviceable in mentally classifying young, deaf and dumb chil-
dren." '
Knox, in' his work among the immigrants at Ellis Island , found it
impossible (even with the services of an interpreter) to use scales in
which language responses were required. Faced with this language
obstacle, and under the necessity to diagnose mental disease and
mental deficiency among the immigrants, Knox devised a series of
non-language tests, many of which are excellent and still widely
used in psychological clinics.
Pintner and Patterson * also found the language scale "absolutely
inadequate to test the mentality of deaf children." They experi-
mented with the Binet-Simon Scale, but were confronted with
numerous difficulties, such as lack of comprehension of certain tasks
due to physical deficiency which in turn had made for lack in the
environment of opportunity for forms of experience needed to ac-
quire the proper test reaction. Consequently, they constructed a
scale of performance tests which requires practically no instructions
for the child other than natural gestures. Pintner and Patterson
consider the non-language feature of the test as a sine qua non in
the measurement of mentality in the deaf. As to the importance of
the non-language tests, they say: "Here we have a group of indi-
viduals, completely shut off from hearing language, and for that
reason laboring under a language difficulty that only in rare cases is
surmounted to the extent of making them comparable in language
' Knox. H. A.: "A Scale Based on the Work at Ellis Island for EsUbliehinB Mental
Defects," Journal of the American Mtdical Associalion, Vol. ucn (March 7. 1914),
pp. 741-47-
' Pintner. R. and Patterson, D. G.: •■The Binet Scale and the Deaf Child," Journal
of Edutalionai Psychology. Vol. Vl (1915). pp. 301 B.
10 Non- Verbal Intelligence Tests for Use in China
ability to ordinary hearing individuals. Any kind of tests involving
reading or spoken language cannot be used as a test of their mental-
ity. If we employ such tests for measuring the mentality of the deaf
and use the standardization obtained from hearing children, we will
not be measuring mentality but merely difference in language abil-
ity. There may be a greater percentage of feeble-mindedness
among the deaf than among the hearing but the fact that a deaf child
does not measure up to the language standard of a hearing child is
no indication of mental deficiency." The performance tests lately
have been used not only for the deaf but also for the non-English-
speaking children, speech defectives, and children from different
language environments.
The development of the non-language tests was greatly advanced ,
and their practical value definitely recognized, as a result of the
United States Army psychological examinations.' In 1917, when
the psychologists took up the personnel work in the Army they soon
discovered that many of the men were handicapped by langu^e
difficulties. In order to permit the illiterates a real opportunity
to show their ability, a non-language scale was constructed. Dem-
onstration charts and pantomime were used to convey the instruc-
tions to the examinees. These methods require no language direc-
tions or responses. This scale, known as the Army Beta Examina-
tion, consisted of seven tests, including maze, cube analysis, X-O
series, digit-symbol, number checking, pictorial completion, and
geometrical construction. The scale was applied to 23,547 men. Its
results correlate with the Army language examination Alpha to the
amount of .80; with Stanford-Binet, .73; with the composite of
Alpha, Beta, and Stanford-Binet, .91 . This high correlation demon-
strates the practicability of making non-language tests and the
feasibility of their use where the language tests fail utterly. The
unexpected efficiency of the Army Beta Examination thus demon-
strated during the war, later brought about a mushroom growth of
the non-verbal test material. Thomdike,* champion of the measure-
ment movement, who had charge of much of the statistical work in
the development of the Army tests, was first to utilize the data and
experience gained from these tests. The Thomdike Non-Verbal
I Vookum, C. S. aod Yerliea, S. M.: Army Mental TesU.
I Thonidike. E. L.: "A Standaid Group Examination of Intelligence Independent
of Language," Jatmtal o/ApplUd Psychology- Vol. 3, No. I (MbtcIi. 1919), pp. 13-31.
IntroducHoH II
Examination follows the general nature of the Army Beta, but elim-
inates one weakness by providing ten alternative forms of the exam-
ination instead of the single form, thus reducing the error in measure-
ment caused by unfair tutoring. Such alternative forms widen the
field of usefulness of tests in many ways, permitting a study of the
growth of intelligence by repeated testing, comparison of groups
and individuals and increased reliability in the determination of the
intelligence of groups and individuals.
Pintner,' who with Patterson constructed the Performance Scale,
has also, since the war, devised a non-language group intelligence
test. He realized that his Performance Scale, although it required
no language response, was still clumsy and not convenient for appli-
cation to a group. Consequently, in his later scale he devised a set
of six non-language tests for group use. When compared with the
results obtained from the Binet-Simon Scale the correlation was
found to be .66. He recommends that such tests be used in mental
survey work for school children and adults, particularly in communi-
ties containing a large foreign or illiterate element.
In addition to the non-verbal tests which have already been dis-
cussed, there are many others available. Among the more well-
known scales are Myers' Mental Measure, Pressey's Primer Scale,
Kingsbury's Primary Group Intelligence Tests, and Dearborn's
Group Intelligence Tests. All these tests have been widely em-
ployed, with varying degrees of success, by psychologists.
The rapidity of the development of non-language tests has been
phenomenal and indicates that it meets an important need. In the
Binet-Simon Scale, there were only a few tests which required no
language responses. Then followed the performance scales, devel-
oped by Healy, Knox, Pintner and others, in which language re-
sponses are completely eliminated. The Army Beta Examination,
with its wide application among the millions of soldiers, demon-
strated its practical value for intelligence measurement and for
group use. Others have succeeded in advancing the non-verbal
tests beyond the experimental stage. These tests are now applied
to individuals and groups, both as an independent measure and as a
supplement to language tests, with confidence that the results are
trustworthy and fairly adequate.
' Pintner, R.: The Mental Survey.
CHAPTER II
THE EXPERIMENT
A. PRELIMINARY PLAN
In drafting a preliminary plan for the experiment it was first
decided to devise a large number of tests and to try them out on
Chinese children in America. Since the purpose of the experiment
was to develop a non-verbal intelligence scale for use in China, it
appeared essential that the subjects be Chinese. Ten non-verbal
tests were consequently constructed and mimeographed for trials.
Fifty-one persons were examined with these tests, after which the
examinations were discontinued as impracticable. The reasons for
the disuse of the examinations were threefold: first, the tests were
constructed by the subjective method instead of by the objective
or scientific method; second, the tests were mimeographed instead
of being printed, causing the test material to be in many instances
indistinct and difficult of recognition; third, the scarcity of Chinese
subjects, and the difficulty of dealing with the few which were
available. Three months' time and much labor had been expended,
and naturally the results were discouraging. An important fact,
however, was revealed by these examinations; namely, the children
of naturalized Chinese and of Chinese long resident in this country
had been affected by their American environment and training, so
that they were more American than Chinese. Tests which were
applicable to American-Chinese children would be quite irrelevant
if applied to children in China. As a result of these findings, the
mimeographed tests were abandoned and thought was turned to the
formulation of a new plan.
A careful study was then made of all the available intelligence
tests, especially the non-verbal forms. The new plan under con-
sideration was to select the best elements in the American non-verbal
tests and to attempt to develop them into a non-verbal scale for use
in China. At the time (1920), there were available the following
non-verbal and semi-non-verbal tests:
I. Army Beta Examination
The ExperimerU 13
2. Dearborn Group Tests of Intelligence, Series I
3. Haggerty Intelligence Examination Delta 2
4. Holley Picture Completion Test for Primary Grades
5. Myers Mental Measure
6. National Intelligence Tests
7. Otis Group Intelligence Examination
8. Pintner's Mental Survey Tests
9. Pressey Primer Scale
10. Trabue Mentimeters
The question arose whether all of these tests, or whether any of
them, could be used in the experiment. Before coming to a decision,
it was necessary to formulate definitely the principles to be embodied
in the proposed scale for use in China. After considerable study, the
following principles were adopted as criteria:
1 . Tests should involve no language responses from the subjects.
2. Test materials should be drawn from social environment com-
mon to all peoples.
3. Test material should exclude, as much as possible, school
training.
4. Test material should be of interest to all types of subjects.
5. Tests should be valid as a measure of intelligence.
6. Tests should be reliable.
7. Objective methods should be employed in both giving and
scoring of tests.
8. Tests should measure a wide range of intelligence.
9. Tests should indicate mental growth.
10. Tests should be adapted for group use.
11. Time for testing and scoring should be reasonably short.
12. Instructions for testing and scoring should be simplified for use
by teachers and others who are not specialists in measurements.
13. Tests should have alternative forms as a preventive against the
vicious effect of coaching.
14. Test material should be inexpensive, easy to handle, of small
bulk, and easily kept in order.
1 . Tests should involve no language responses from the subject.
General intelligence signifies a group of related inborn capacities
for adapting one's self to specific situations in life. Inborn capaci-
14 Non- Verbal Intelligence Tests for Use in China
ties, however, are never measured directly but are always inferred
from the ability displayed. Language use is ooe of tbese abilities
which ordinarily is a good index to intelligence, but it has its limi-
taticMis. It cannot be employed, for instance, as a medium to
measure intelligence when the language varies amcH^ the subjects
under examination sufficiently to make understanding or executing
the tasks difficult, slow, or impossible. Such a conditicMi exists in
China. The languages spoken in various sections differing widdy,
people from Peking do not understand the dialect of Canton, and
the Shanghai dialect is different from that of Hankow. This diver-
sity of dialects is not only characteristic of the provinces but exists
in local districts of the same province. The written language, it is
true, is identical throughout China, but comparatively few can read,
90 per cent of the Chinese people being illiterate. Under these
conditions, a non-verbal test for use in China would have great
superiority over any existing language test.
2. Test material should be draum from social environment common to
all peoples.
It is a well-known fact that social environment affects the devel-
opment of intelligence. Edison, bom and raised in the wilds of
Thibet, would doubtless never have developed into the particular
kind of a mechanical genius he now is. To measure a Thibet-bom
Edison by the standards used in examining an American-bom
Edison, would manifestly be inaccurate and unfair. The uncivilized
Miaotze boy in Yunnan could not be expected to answer questions
on automobiles or airplanes; and the New York boy, raised in the
Bronx, a>uld not be expected to answer intelligently questions on
rice growing. There should be common grounds; the test material
should be drawn from an experience common to all. Tests should
measure capacity, and this can be accomplished by measuring only
those traits possible of development by all subjects. Tests, based
on such a principle, could be employed over all of China.
3. Test material should exclude, as far as possible, school training.
As ninety per cent of the Chinese people are illiterate, test
material which requires school training must prove inadequate.
Culture and school training are both acquired, not innate. They
vary in different persons according to the environment to which
The Experiment 15
they have been subjected. The boy ignorant of mathematics could
not be expected to solve problems in algebra as well as the son of an
instructor in mathematics. In order to compare the native ability
of children, therefore, the products of school training should be
excluded from the test material.
4. Test material should be of interest to all types of subjects.
Interest in the tests is essential to proper reaction; therefore, a
good test should arouse interest in the subjects of widely differing
mentality and type of intellect. Unless this is accomplished, the
results of the test will not indicate the actual intelligence. Errors
have been made in drawing conclusions as to the intelligent^e of the
individuals in a group, when these individuals have had interests
different from those called out by the test. For instance, a mechan-
ical test given to a co-educational class usually results in a higher
score for the boys than for the girls. The scores in this case do not
prove that the boys are more intelligent than the girls; they prob-
ably indicate rather the difference in degree of interest in the sub-
ject between the boys and girls. It is, therefore, evident that the
tests to be adequate must be of common interest to the entire group.
5. Tests should be valid as a measure of intelligence.
A test is valid when it actually measures the trait which it pro-
fesses to measure. A valid test, therefore, implies actual, con-
sistent measurement. Whether a test is valid or not is determined
by the correlation of test scores and the elements of the intelligence,
as objectively known by other means. The checks on validity most
often used are school marks and progress, and estimates by teachers
and associates. In applying this principle, the reliability of such
checks should be investigated.
6. Tests should be reliable.
Reliability in a test indicates the obtaining of similar results from
two or more testings of the same subjects under the same conditions.
Perfect reliability implies identical results from two or more testings
under identical conditions, and is, therefore, never completely
attained; but competent authorities agree that the coefficient of
reliability should be .90 or higher for a group of equal age.
l6 Non-Verbal Intelligence Tests for Use in China
7. Objective methods should be employed in both giving and scoring
of tests.
Objectivity is attained when the methods and procedure of testing
and scoring are uniform and independent of personal opinion so that
the results may be verified by other testers. That is to say, methods
of testing and scoring should be identical at all times for all testers.
The personal equation of the teacher should be eliminated as far as
possible. The results of the testing should endure verification in
all cases where the same tests are applied to the same subjects,
using the same methods under similar conditions.
8. Tests should measure a wide range of intelligence.
The term "general intelligence" means the combination of many
mental traits. It varies in amount in individuals from practically
zero in the lowest grade of idiots to that large quantity, at present
unmeasured, of the world's greatest genius. Its distribution,
according to the best available estimates, approximates a bell-
shaped curve; that is to say, there are few of genius level, a large
number of ordinary or average people, and comparatively few
idiots. An intelligence test, to be entirely satisfactory, should be
easy enough for all except the hopeless idiots to make some score
and sufficiently difficult for a person of great genius not to make a
perfect score. On the other hand, the scores should be distributed
continuously and around one mode. Furthermore, the tests should
measure a large number of unlike or differentiating traits. The
ideal way would be to measure every trait that contributes to intel-
ligence and to give each trait a weighting proportional to its con-
tribution to the total intelligence. This is impossible in our present
state of knowledge, but an intelligence test should measure as many
differentiating traits as possible.
9. Tests should indicate mental growth.
Intelligence develops along with the advance of chronological age
up to a point believed to be somewhere near the end of the adolescent
period. As the child grows older, his native endowment unfolds.
So a normal ten-year-old child should be able to do more than the
eight-year-old child and a normal eight-year-old child should know
more than a six-year-old child. The intelligence test should reveal
the different stages of development by improved scores with each
The Experiment 17
increase in chronological age. This mental index is known as mental
age. The mental age divided by the chronological age gives what
is known as the intelligence quotient.
10. Tests should be adapted for group use.
Group testing enables the examiner to test many persons at a
time and therefore makes possible the testing of many more children
with the same expenditure of time, labor and money, than can be .
achieved by testing them singly. The success of the group-testing
method was shown in the United States Army during the war. To
test two million soldiers individually in so short a time was totally
impossible, but by means of group tests the men were speedily sorted
and classified. Group testing may not give such an accurate diag-
nosis as does individual testing, but it is generally satisfactory and
can be supplemented by individual tests in exceptional cases. For
general use in China, the tests must be adapted for group use.
11. Time for testing and scoring should be reasonably short.
Time for testing should be long enough for the average subject
to give response without hurry, but it should be reasonably short
so as not to cause fatigue in the subjects nor to entail such adminis-
trative inconvenience as to prevent its use. If two scales, for in-
stance, give the same result, and one takes thirty minutes to give,
while the other takes two hours, the former is certainly preferable
to the latter. As to scoring, the test should be constructed so that it
may be accurately, uniformly, and rapidly scored with little depend-
ence upon the judgment of the persons doing it. Mechanical scor-
ing devices should be employed whenever advisable.
12. Instructions for testing and scoring should be simplified for use
by teachers and others who are not specialists in measurement.
There are not many psychologists in China. Most of the measure-
ment work probably will be done by the ambitious teachers and
others who are not specialists in measurements. To facilitate the
work, it is absolutely necessary that the instructions for both testing
and scoring should be simplified so that they can be followed easily.
The instructions should be clear, concise, and adequate, but must
be brief, consistent, and uniform for all who are to be testers.
l8 Non- Verbal InleUigence Tests for Use in China
Whenever possible, instructions should employ a preliminary dem-
onstration test in order that the subjects may understand clearly
what they are expected to do.
13. Tests should have alUrnalive forms as a preventive against the
vicious effect of coaching.
The one-form scale has at least two defects. First, if the tests
are to be used as a basis for promotion in education or business,
ambitious parents will be likely to purchase the material and coach
their children with the object of increasing their scores. Second,
the one-form scale cannot be used in retesting for a study of mental
growth. Therefore, alternative forms should be prepared. They
should have the same value, however, as the original form, and
measure the same traits.
14. Test material should be inexpensive, easy to handle, of small bulk
and easily kept in order.
As communication is inconvenient in some parts of China and the
merit of intelligence measurement is not as yet widely demonstrated
there, it is important that every advantage be taken to facilitate the
use of the tests. They should,therefore,beeasy to handle; they should
not be bulky nor contain apparatus which is difficult to keep in
order; and the cost of the test material should be small.
B. TESTS USED IN THE PRESENT EXPERIMENT
In consideration of the above adopted principles, a selection was
made from the ten non-verbal tests listed on pages 12 and 13 of the
following teats, to be used in the experiment:
1. Myers Mentai Measure
2. Pintner's Non-language Tests
3. Pressey Primer Scale
4. Army Beta Examination
5. Dearborn Group Examination, Series I
General Examination i
General Examination 2
General Examination 3
A brief description of each of these tests follows:
I. Myers Mental Measure.^
' School andSociely. Vol. 10. pp. 3S3-6o (ig'B)-
The Experiment 19
The Myers Mental Measure was devised by Carolyne E. Myeis
and Garry C. Myers for school use, the Measure being based upon
the Army Beta tests. Mr. and Mrs. Myers were interested in the
classification of children as early as possible on the basis of intelli-
gence in order that children of marked ability might be selected for
rapid advancement, and that those of very low grade intelligence
might early be segregated. To do this, they devised a scale, uni-
versal in nature, with the ho[>e that it could be applied to school
children of all ages and given in 15 or 20 minutes to a targe number
of individuals.
The scale con»sts of four tests, all of which are pictures. The first
test is called a directions test. It requires the child to obey certain
directions, such as to draw a line or make a mark in a particular way.
It furthermore needs no preliminary demonstration other than a
brief pantomime with very little spoken instruction. The second
test is a picture-completion test consisting of pictures of familiar
objects or situations, with one important element missing which
the subject must supply. The third is a learning test which requires
the subject to make substitution of proper symbols for other sym-
bols, while the fourth is a common element test in which the subject
is asked to mark the pictures which are similar in some way.
Mr. and Mrs. Myers used the Stanford-Binet Scale as a check
upon their own scale. Omitting test 3, which gives practically
zero correlation, the total of tests I, 2 and 4 correlates about .So
with Stanford-Binet.
2. Pintner's Non-Language Tests.^
Pintner's Non-Language Tests were devised by Professor Rudolph
Pintner with the purpose of measuring the general intelligence of
the deaf, illiterate, and non-English speaking. A knowledge of
English is not needed either to understand the directions or to make
responses. The scale consists of six tests which have been arranged
for group testing, suitable for children and for adults. The first is
the imitation test which is essentially the same as the Knox Test.
The second and third are "easy learning" and "hard learning" tests
respectively. The task in the next one is a "drawing completion"
test, which is an abbreviated form of the lai^er drawing test devised
'Pintner, R.i "A Non-laiiguage Group liitelli(ence Test." Journal c4 Applitd
PiytkaloO' Vol. in. No. 3 (September, 1919),
20 Non-Verbal Intelligence Tests for Use in China
by the same author. The fifth is the "reversed -drawing" test,
which requires the subject to draw the reversal, or counterpart of a
drawing given. The last test is "picture- reconstruction," involving
the rearranging of picture sections with the object of completing the
entire picture. All the correlations between each test and the total
score were found positive and fairly high; and the correlation be-
tween the 1 Q on the Stanford-Binet and the percentile rank on the
Pintner's Non-Language Tests was .66.
3. Fressey Primer Scale.*
Pressey Primer Scale is known as the "crossing-out" test. As
the authors describe, "each test asks of the subject that by crossing
out some one thing, he eliminate a wrong, irrelevant, or extreme
element in a situation." The scale was devised for the use of the
first three grades. In the first test, the subjects are required to
cross out an unnecessary dot in each of several groups of dots. The
second involves the crossing out of the most discordant, or dis-
similar object from a group of three objects; the third, for the
crossing out of the superfluous block in each square, after the other
blocks have been fitted into four patterns at the top of the page; and
the fourth test provides for the crossing out, in each picture, of the
absurd part.
4. Army Beta Examination.*
The Beta Examination was introduced primarily for the group
testing in the Army during the World War of those illiterate in
English . Instructions were given in the form of four demonstrations
at the beginning of each test with gestures and pantomimes. The
original or trial series consisted of fifteen tests, but after an extensive
trial, seven tests were finally retained. These tests are known as
maze, cube analysts, X-0 series, digit-symbol, number checking,
pictorial completion, and geometric construction. The maze test,
devised by C. R. Brown, was retained from the preliminary trials
because it could be successfully demonstrated, gives few zero scores
and correlates fairly well with the total scores of army Alpha and
'Presaey. S. L. and Pressey, L. W.: "Croas-out Testa." Journal of Applied Psy-
ciotojy.Vol. in (1919), pp. 143-150.
* SetYetVeaajidYoakuta, Army Mftital Tesis; aiao Mtmoirs of th^ Nalu^nl Academy
iifScUnct, Vol. stv.
The Experiment- 21
Beta. The cube analysis test was originally devised by Edwards at
Camp Lee, to take the placeof theusual form of testforarithmetical
reasoning. Test 3 (X-0 series) was an attempt to provide the
equivalent of test 8 of Alpha. It proved to be an easy and effective
way to indicate the institutional feeble-minded group. The digit-
symbol test was modeled after the well-known substitution test
which had been used in various forms by Woodworth, Pintner,
Whipple, and others. Number checking was devised by Thorn-
dike, and found satisfactory on all counts. The pictorial com-
pletion test was devised by Kelley and patterned originally after
the Binet mutilated pictures. The last test, geometrical construc-
tion, was patterned after the various form-board tests. It was
found particularly good in picking out the higher levels of ability.
The product- moment coefficient of correlation between the Beta
Examination weighted score and Stanford-Binet mental age was
reported to be. .731 ± .012.
5. The Dearborn Group Tests of Inlelligence, Series /.'
The Dearborn Group Tests of Intelligence, Series I, were devised
and standardized by Professor W. F. Dearborn, of Harvard. They
are not linguistic, and consist of three parts (known as General
Examinations I, 3, and 3) for use in the first three grades. General
Examination i contains a "directions test," a "clock test" and a
"circus" test. General Examination 2 consists of seven "games"
which, in order, are "color blocks," "substitution," "ladders,"
"picturemaking," "picture recognition," and "dominoes." General
Examination 3 consists of "picture completion," "map of town,"
"ruler," and "number form puzzles." A correlation of .87 of the
Stanford-Binet Scale with Dearborn tests has been reported.*
C. METHOD OF PROCEDURE
The present testing was carried on in Public School No. 108, sit-
uated in the section of New York City which is populated and in-
habited by immigrants. This school has only the kindei^arten and
the first four grades. Each grade is divided into two sections, so there
are altogether nine sections in the school. During the fall of 1920
33 Non- Verbal Intelligence Tests for Use in China
when the experiment was started, the enrollment was about i,ooo.
After a preliminary trial, it was found impossible to test all the
pupils, as those who were in the kindergarten and the first grade
could not follow the directions of the tests satisfactorily. They
were eliminated from the testing and only the children from grades
2B to 4B were tested. In these grades, there were 185 boys
and 216 girls, a total of 401. The distribution of the children
according to nationalities was as follows:
Jewioh-Itatlan
Cbincse-Jewish .
Spanish- Italian .
Only a few of these children were Chinese; more than 90 per cent
were Italians. However, since the purpose of the experiment was
to select the best non-verbal tests, and since special forms for use in
China would have to be made later and no norms were expected to
result from the testing, the nativity of the subjects was wholly
immaterial. Prior to this experiment, the school had never used
any standardized psychological or educational tests. The principal
and the teachers were all deeply interested in the experiment and
offered every possible assistance to make it a success. The writer
took advantage of this unusually excellent opportunity to visit the
school frequently and make friends with both teachers and pupils.
In consequence, when he was ready to test the children, although
a foreigner, he was no longer a stranger to the school population.
All the testing was done in a large classroom equipped with desks,
blackboard and comfortable chairs. Twenty-eight pupils were
brought to this room, to be tested, at one time. The pupils were
seated apart from each other, so the possibility of copying was
reduced.
Before giving a test to the children, the writer familiarized himself
with the instructions by trying them with other children. All the
examinations were conducted by the writer himself with the assist-
ance of the principal. Miss Rae, and a college trained teacher.
Pains were taken to maintain uniformity both of the procedure in
The Experiment 33
the testing and the environment in the room. The order at testing
was always from the younger ones, then to the older ones; that is,
from Grade 2B to 4B. The testing time was from 10 a. m. to 12 m.
and from i p. m. to 3 p. m. Every effort was made to make the test-
ing informal and pleasant yet stimulating and searching.
The scales were given on the {ollowing dates:
No. I Nov. 24-26, 1920 — Myers Mental Measure
No. 2 Dec. 4-6, 1920 — Pintner's Non-language Tests
No. 3 Dec. 14-16, 1920 — Pressey Primer Scale
No. 4 Dec. 20-22, 1920 — Army Beta Examination
No. 5 Jan. 4-8, 1921 — Dearborn Group Tests of Intelligence
In giving the scales, all the original directions were followed liter-
ally, except in the cases of the Dearborn Group Tests of Intelli-
gence and the Army Beta Examination, both of which were modi-
fied to meet the peculiar needs. The altering of directions for the
Dearborn Tests of Intelligence was very slight. The only change
made was in the "clock" test of General Examination i. The
original direction calls for the subjects to draw in the clock hands,
indicating the time when school begins in the morning, when school
begins in the afternoon, and when school closed in the afternoon.
In the school where the testing was done, starting and closing time
isdifferent fordifferent children. Asit was therefore confusing for
the children to answer these questions, the following directions were
substituted: "In the first clock draw in the hands so as to show what
time school assembly begins in the morning. Draw the hands in
the next clock, to show what time school recess begins for lunch.
In the third clock show what time school begins in the afternoon."
As suggested by Dearborn, the tests were given in two periods, but
with one day interval between them.
In the case of the Army Beta Examination, the procedure was
considerably modified. The original directions call for a blackboard
frame consisting of eight fitted sections, a blackboard chart in a
continuous roll 27 feet long, cardboard pieces for Test 7, and pat-
terns for constructing Test 2. It was impossible to get the original
apparatus, so the school blackboard, self-made cardboard pieces,
and real wood cubes were substituted. Furthermore, according to
the original form, it was necessary to have an examiner, a demon-
34 Non- Verbal Intelligence Tests for Use in China
strator, and a number of orderlies. The demonstrator was charged
with the single task of doing before the group just what the group
was later to do with the examination blanks. The use of a special
demonstrator, as provided for in the original tests, was considered
both su[>erfluous and cumbersome. The examiner also performed
the duty of the demonstrator. As in other scales, he was to give the
directions as well as demonstrate to the class the preliminary test.
The adapted directions were as follows:
Directions for Test i
As aooa as the pupils have been properly seated, and examination blanka dia-
tributed, the examiner says, "Here are aome papers. You must not open them
or turn them over until you are told to."
Holding up the Beta blank, the examiner continues; "In the place where it
says name, print your name very clearly. Remember, print your full name.
If you are Mary Jones, print Mary Jones; if you are John Smith, print John Smith.
Right after your name, in the place where it says rank, write your grade. Do
you know in which grade you are? That is fine. Write down your grade very
clearly, so that I can read it. Look over your paper again and show me whether
all of you have written your name and grade very clearly."
Before the examination begins, each paper should be inspected by the auiBtants
in order to make sure that the name and grade are clearly written. Then the
Examiner remarks, "Attention! Watch what I do on the blackboard. I am
going to do here what you are going to do on your papers. Ask no questions.
Wait till I say, 'Go ahead.' Now is everybody readyi' Turn your paper over.
This is Test I, here (pointing to the page of record blank). Have you found it?"
After all have found the page, the Examiner continues, "Don't make any marks
till I Bay 'Go ahead.' What I want you to do is to draw a line which shall pass
through the pictures from left to right without touching any line. Now watch
me work on the blackboard." After touching both arrows, the Examiner traces
through the first maze with chalk, slowly and purposely makes one mistake by
going into the blind alley at upper left-hand comer of the maze and aski the clasa,
"Is this correct?" After the class answers "No," the examiner places his hand
back to the place where he may start right again, and traces through the rest
of the maze, indicating an attempt at haste and hesitating only at ambiguous
points. After this is done, he says, "Everybody readyi All right. Go ahead.
Hurry up." At the end of two minutes, the examiner says, "Stop! Turn over
thepage toTest 3."
Test 2
The examiner then continues, "This is Test 3 here. Look!" After everyone
has found the page he says, "I want you to count the cubes and write the number
in the little square below the picture. Now watch me work on these blocks."
The order of procedure is as follows:
The Experiment 25
a The examiner points to the three-cube model on the blackboard, making
a rotary movement of the pointer to embrace the entire picture.
b With simiUr motion he points to the three-cube wood model on the desk.
c The examiner next points to picture on blackboard and asks the clasa, "How
d The examiner turns to cube model and counts aloud, putting up the fingers
while so doing, and encouraging the class to count with him.
e The examiner taps each cube on the blackboard and asks the class, "How
/ After the class answers correctly, the examiner counts the cubes on black-
board silently and writes proper figures in proper places. (The rest is the same
as the original directions).
After the demonstration is completed, the examiner says, "Everybody readyl
All right. Go ahead. Hurry up," and at the end of 3yi minutes be says,
"Stop! Look at me and don't turn the page."
Test 3, x-o Series
"This is Test 3 here. Look." After everyone has found the page, he says,
"I want you to draw in X or O in the proper squares which are empty. Now
watch me work on the blackboard." The examiner first points to the blank
rectangles at the end, then traces each "O" in chart, then traces outline of "O's"
in remaining spaces and draws them in. Then he traces first "X" in next sample,
moves to next "X" by tracing the arc of an imaginary semicircle joining the two,
and in the same manner traces each "X," moving an arc to the next. He then
traces outlines of "X's" in the proper blank spaces, moving over imaginary arc
in each case, and asks the class what should be drawn in. The examiner follows
the answers of the class and hits in remaining problems very slowly. After the
demonstration is finished, the examiner says, "All right. Go ahead. Hurry
up!" At the end of i^ minutes he says, "Stop! Turn the page to Test 4."
Test 4, Digit-Symbol
"This is Test 4 here. Look," After everyone has found the paper — "I want
you to study each number and memorize the symbol which represents it. Put
in the right symbol under the right number." The examiner touches the number
in first sample with index finger of right hand; holding finger there, finds «ith
index finger of left hand the corresponding number in key; drops index finger of
left hand to symbol for the number found; holding left hand in this position
writes appropriate symbol in the lower half of the sample. Similar with the other
sample. But for the last three samples the class is asked to give the correct
symbols. At end of the demonstration, the examiner says, "All right. Go
ahead. Hurry up!" At the end of 2 minutes the examiner says, "Stop! But
don't turn the page,"
Test 5. Number Checking
"This is Test 5 here. Look," After everyone has found the page — "I want
36 Non-Verbal Intelligence Tests for Use in China
you to find out whether the two numbers are the same. H they are the lame,
write 'X' on the dotted line between them; if they are not the same, write '0'
on the dotted line between them. Now watch me Aa this on the blackboard."
In this demonstration the examiner must get "Yes" or "No" responses from the
clasa. If the wrong response is volunteered by the group, the examiner points
to digitB again and gives right response, "Yes" or "No" as the case may be.
The examiner points to the lirst digit of lint number in left column, then to second
digit first number in left column and second first number in right column. He
says "Yes" to the ckss and marlcs an "X" on the dotted line between the number.
The examiner does the same for the second line of figures, but here he indicates
by "0." In the last three samples, the class is asked to answer "Yes" or "No."
After the demonstration is over, the examiner pomts to page and saya, "All
right. Go ahead. Hurry up!" At the end of 3 minutes, he says, "Stop!
Turn over the page to Test 6."
Test 6. Pictorial Completion
"This is Test 6 here. Look! A lot of pictures." After everyone has found
the page — "Every picture has something gone. I want you to fix it. Now look
at the pictures on the blackboard." The examiner points to the picture of the
hand, then to the place where the finger is missing and asks the class, "What is
gone?" After the class has given a correct answer, he says, "That's right.
The finger is gone." Then he draws in the finger. Similarly with the other
samples. When the demonstration is finished, the examiner saya, "Fix all the
pictures on the whole page. Allright. Go ahead. Hurry up!" At the end of
3 minutes, the examiner says, "Stop! but don't turn over the page."
Test 7. Gboubtrical Constbuction
"This is Test 7 here. Look." After everyone has found the page — "Here
are blacks. Imagine that you could fill them in this square, and then dia.w in the
intersecting lines in this square. Now watch me." The examiner points to
the first figure on blackboard. He then takes the two pieces of cardboard, fits
them on the similar drawing on blackboard to show that they correspond and puts
them tc^ther on the square on blackboard to show that they fill it. Then, after
running his finger over the line of intersections of the parts, he removes the pieces
and solution In the square on the blackboard. Similarly for the other samples.
At the end of the demonstration, the examiner holds up the blanks, points to each
square on the page and says, "All right. Go ahead. Hurry up!" At the end of
3}i minutes he says, "Everybody stop!" Papers are then collected by monitors
immediately.
While the children were doing the tests, a general impression of
their attitude was recorded. As a whole they showed line spirit,
worked enthusiastically, and seemed to enjoy the work. Judged by
their manner, they seemed especially interested in the Pressey
The Experiment 27
Absurdity Test and in all the Pictorial Completion Tests. But in
the case of the Dearborn Tests, the majority of the children were
bewildered by lack of clearness in the directions and showed signs
of fatigue due to the over-long time required.
Practically all the tests were scored by the writer himself, with
great care. The Dearborn Group Tests of Intelligence were found
to be the most difficult to score. It took on an average of fifteen
minutes to score a child's paper containing the three examinations.
The amount of time required in scoring the Dearborn Tests seemed
greatly out of proportion to the results obtained.
CHAPTER III
FORMATION OF A CRITERION
The chief object of the present study is to select the best group
of tests from the five intelligence examinations which were given
to the children in New York City Public School No. io8, with the
view to modify them for use in China. In order to do so, it is
necessary to have a definite, constant criterion with which to com-
pare tests. This criterion should be made up from as many factors
as possible that are known to be indices of the constituents and
development of general intelligence. These factors must be reliable
indicators if the criterion, which is depended upon to determine
the value of the selected tests, is to be trustworthy. In this chapter,
an account of the selection of the best elements to be included in the
criterion with which to compare the tests is given.
The elements of the criterion adopted are: (l) age, (2) school
marks, (3) school progress, (4) teachers' estimates of intelligence,
and (5) composite test scores of (a) Dearborn Group Tests of
Intelligence, (b) Pintner Non-language Tests, (c) Army Beta
Examination, (d) Myers Mental Measure, (e) and Pressey Primer
Scale. Certain weights, to be described later, are given for age,
school marks, school progress, and teachers' estimates of intelligence;
and to each of the mental test scores, and the total combined into
one rating called Final Criterion.
A. ELEMENTS OF THE CRITERION
The elements which may be included conveniently in a criterion
for pupils' intelligence are age, teachers' estimates, school marks,
school progress and test scores. All of these measure general intel-
ligence in different ways, though their values are not equal. Some
or all of them should be used in combination and given weight in
reference to their special significance in showing the presence of
intellectual abihty.
Formation of a Criterion 29
1 . Chronological Age.
The chronological ages of the children were copied directly from
the school record, in order that they might be accurately known.
As the administiiring of the intelligence scales extended from
November 24, 1920, to January 8, 1921, the median date was
taken as a standard to calculate the ages, that is, December 17,
1920. AH the ages, therefore, shown on the record book date from
the birth of the individual up to December 17, 1920.
The dependence of intelligence upon age in adults is a theoretical
problem, but gradual mental growth in children is accepted by all
[>sychologists as beyond dispute. Binet, Terman, Thomdike, and
others have all found that the general intelligence of a child gradu-
ally develops as his age advances until he reaches maturity. Kelley '
and Fretwell,* according to their experimental studies, find that
there is a negative correlation between achievement — an indication
that with pupils in the same grade the younger are the brighter
ones. Since all the subjects in this study are below fifteen years of
age, it is evident that age should be considered in the making of the
criterion for the selection of the tests, but that the young child in an
advanced grade should be given a bonus, and the older child in the
early grade a demerit in utilizing age as a criterion of intelligence.
Therefore an age distribution table was prepared and a numerical
value was assigned to different ages in different sections of the
grades. As the ages for the sexes were different, so the values
assigned were also different. For instance, in section B of the
second grade boys, g was assigned to 7 yrs. 6 mo.; 8 to 8 years;
7 to 8 yrs. 6 mo; 6 to 9 years; 5 to 9 yrs. 6 mo.; 4 to 10 years;
and 3 to above 10 years. For a complete record, see Table I.
2. Teachers' Estimates.
A teacher associates with children daily. She knows in a general
way a pupil's strong points as well as his weak ones. Her estimate
of his general intelligence should be accurate in some particulars
if she clearly understood that her rating was to be based upon native
intelligence and not school achievement. Fretwell found that the
correlation of the composite of teachers' judgments of pupils with
the composite of eleven tests was .66. Kelley found that "the cor-
' Kelley, T. L.; Educolionat Guidanct. .
' Frelwell, E. K.: A Study in Edutatkmal Prognosis.
Non-Verbal Intelligence Tests for Use in China
TABLE I
moN OF Ages and the Numerical Value Assignei> to Each Age
BOVS
Grade
2B 1
3A 1
3B
4A
4B
Age N
Value N
Value N
Value
No.
Value
No.
Value
Yr. Mo.
7-0
10
11
12
14
7
3
9
lO
11
13
6
I
7
8 I
T
9
lO
I
13
8
6
1
7
9
8
9
ir
9
S
6
6
7 :
8
8
3
lO
9
6
2
5
3
6
7
I
9
10
o
2
4
6
5
6
15
8
lO
6
3
4
I
I
7
11
2
3
I
3
1
4
6
II
6
3
3
5
13
3
X
4
13
6
3
I.l
o
3
I.l
6
3
H
3
14
15
6
',
3
3
Grade
3B
3A
3B
4A
4B
Age
No,
Value
No.
Value
No.
Value
No.
Value
No.
Value
Yr. Mo
7.0
lO
II
"3
7
6
9
1
lO
12
tt
30
9
8
6
7
7
8
I
10
9
o
17
6
i8
7
II
9
3
9
6
3
5
2
6
lO
8
10
2
4
6
5
i8
, 7
10
6
3
4
3
6
3
11
3
3
5
9
II
6
4
13
o
3
3
12
6
I
3
13
I
3
4
Formation of a Cnlerion 31
relation between class standing and the regression equation com-
bination of the estimates of traits by teachers" was .76. He re-
marked, as a result of his investigation, "With such a high correla-
tion, a division of pupils into classes by means of teachers' estimates
would be highly reliable." ' Teachers' estimates are not enough,
however, because a teacher may overemphasize some factors and
neglect others. Terman found that teachers frequently err in
estimating general intelligence because they neglect to consider
age and emotional differences. Whipple found that teachers
estimate the dull children too high and the bright children too low.
In this study, teachers were requested to separate their children
into five classes A, B, C, D, and E on the assumption that "Intel-
ligence is a general capacity of an individual consciously to adjust
his thinking to new requirements: it is general mental adaptability
to new problems and conditions of life."' They were asked to rate
few as A's or E's, comparatively more as B and D, and a larger
number as C. The teachers were warned not to grade the intel-
ligence of their children by their school achievement and deport-
ment but by their general abilities or brightness shown both in their
academic work and extra-curricular activities. In order to be fair
to the children, the teachers were requested to grade their children
independently three times, November 24, 1920, December 16,
1920, and January 11, 1921. The dates were sufficiently far apart
so that the teachers scarcely remembered their previous marks. An
aggregate of the three estimates was taken as the estimate of the
teacher for the general intelligence of the child.
In order to make possible statistical treatment, the letters A,
B, C, D and E, given by the teachers, were transmuted into
numerals. They are shown as follows:
' Kelley, T. L.: Educational Guidance, p. 16.
'Stern. W.; "The Psychological Methods of Testing Intelligence." tranilated by
G. M. Whipple, Educational Psychology Monographs. No. 13. p. 3.
32 Non- Verbal Intelligence Tests for ifse in China
3. School Marks.
School marks have been the most universal method used for
grading pupils. In the past, it has been the only method of judging
the ability of the children recorded in school reports. While it is
true that teachers often do not agree with each other, yet school
marks are a fair measure of mental ability. Fretwell found the cor-
relation between school marks and a group of tests as high as .57.'
McCall says, "Teachers' marks are important because they are
now and will continue for some time to be the most universal method
of rating pupils. In fact, they may continue forever to be the
criterion for classiEcation because teachers will soon be familiar
with the simple mysteries of scientific measurement." '
In the school in which this experiment was carried on, there were
weekly, monthly, and term examinations. The teachers mark the
children by letters. The marks used in this study are the average
school marks of the children in the fall term of 1920-21. For the
convenience of statistical study, the school marks were turned into
figures as follows:
School Mmki Nunshical Valub Assignid
4. School Progress.
By school progress is meant the progress which a child has made
in the school, that is, his present class standing. The very reason
that one could be promoted to a certain grade and maintain his
standing there shows that he must have the mental ability to handle
the subjects. When a pupil fails to make satisfactory progress in his
school work, he is ordinarily retarded or eliminated. It is clear, there-
fore, that advance in grade usually indicates development of intelli-
gence, although there may be exceptions. Sometimes the school per-
mits a pupil to move up a grade or class even though he has not done
the work below, because the parents of the child insist upon it; or
because the teacher wants to get rid of the backward child; or
Formation of a Criterion 33
because the school must make room for younger pupils. However,
the majority of the pupils are promoted because their mental
ability permits the expected scholastic attainment, and therefore
the grade reached should be utilized in building up the criterion for
the selection of tests.
In Public School No. 108, classes are divided into A and B sec-
tions, and pupils are promoted by sections twice a year, A being the
lower section. The following numerical values were assigned to the
different sections of the different grades:
NinotnuL Value Assigned
Grade IIB o
Grade IllA 5
Grade IIIB 10
Grade IVA 15
Grade IVB 20
5. Test Scores.
All tests given are standardized. They all are claimed to correlate
highly with general intelligence. The correlation between Myers
Mental Measure and Stanford-Binet Scale was reported to be .80;
between Pressey Primer Scale and Stanford-Binet Scale, .60;
between the Army Beta Examination and Stanford-Binet Scale,
.73; and between Dearborn Group Tests of Intelligence and Stan-
ford-Binet Scale, .87. It is safe to assume that if the individual
scales are so valid as a measure of intelligence, a combination of the
test scores of all these scales would result in an excellent measure-
ment of general intelligence. Based upon this assumption, the com-
bined test scores were included in the final criterion.
B. TEST SCORES WEIGHTING
In order to study the combined value of all the scales given to
the children, it was necessary to have a composite of all the test
scores. This could be done by summing all the raw test scores of
the different scales. But the merits and variabilities of the scores
of the different scales are different. To sum the raw scores is,
therefore, unfair. The problem then to be next considered was how
to weight the different scales properly.
It was important to know the merits of the different scales, when
a weight was attached to each. One of the simplest methods for
34 Non-Verbal Intelligence Tests for Use in China
finding them was to prepare an age or grade distribution table and
inspect the slope shown. This was based on the assumption that a
child, as he advanced in age and grade, should make a higher score
in an intelligence test. This gradual increase of scores in propor-
tion to the advance in age and grade permits the appearance of a
slope on the distribution table. When scores for a given ^;e are
near together and on the whole greater for each increased age, which
is shown graphically by their clustering about the slope line, the
more valuable is the scale.
Based on the above assumption, age and grade distribution tables
of each scale were prepared for both sexes. An inspection of the
tables shows the existence of some slope in all the scales; Pintner
Non-language Tests, Army Beta Examination, and Dearborn Group
Test, however, seemed better than the rest. All of the tables could
not be shown here, but for illustration, the distribution of Pintner's
Non-language Tests is given in Table II . Attention is called to the
slope and the gradual increase of scores as expressed by the medians,
from 50.83 forage 8 to 106.5 forage 11,
Another rough method for finding the merits of the different scales
is to compute the extent of overlapping of the two groups of scores.
The assumption is this: the less overlapping in different grades the
tests show the better measures of intelligence they are. For instance
a good scale should show the differences in mental traits between
the child in Grade III and the child in Grade IV. The more the scale
can indicate the difference, the more reliable the scale. Such over-
lapping can be computed by comparing the two total distributions
of the test scores by stating the variabilities of the two groups and
their central tendencies. The method used in this study, however,
is a shortened one, based on the following formula:'
„ ^ , ■ , , D ./l{No.of cases) > median ii
Per cent overlappmg of A over B -
N A
To illustrate the method, the data in Table III are taken. In
this case A in the formula means Grade III and B Grade IV. The
median for Grade IV is 67.5, which falls midway in the step 65-70.
' See Thomdike, E. L.: Metlal and Social Mtaiuremenls. p. 118 tl.
Formation of a Criterion 35
TABLE II
Age Distribution Showing the Slope and the Increase of Scores as Ace
Advances; Data from Boys Who Have Taken the Pintner Non-
Language Tests
Score
130-135
"5-130
I20-IZ5
ii5-i>o
110-115
105-110
100-105
95-100
90-95
85-90
80-85
75-80
70-75
65-70
60-65
55-60
50-55
45-50
40-45
35-40
30-35
25-30
20-25
15- 20
10-15
5-10
o- 5
Number of Cases .
Median
Quartile
/
////
/////
/////
////
///
/////
//
///
///
/
///
///
//
/
36 Nott- Verbal Intelligence Tests for Use in China
TABLE III
Grade Distributions of the Pkessbv Scale
Scou
CUDB III
awDiiv
85-90
__
9
80-45
5
8
75-80
9
36
70-75
18
19
65-70
17
17
60-65
n
16
55-60
ao
30
50-5S
10
10
45-50
13
8
40^5
15
35-40
7
30-35
3
35-30
4
20-35
3
15-ao
I
10-15
5-10
I
0-
Number of Cases
149
141
Median
59-375
67.5
Quartile
10.6
9.6
The number of Grade III pupils who equal or exceed this score is
therefore — + 18 -(- 9 + 5 or 40.5, which is 27 per cent of the
number in the third grade, 149. The per cent of overlapping of the
third and fourth grades is, therefore, 27 per cent. It is illustrated
by Fig. 5.
By this method the per cents of overlapping were computed for
Grade III and Grade IV in all the scales. The results were as
follows (for illustrations, see Figs. 6, 7, 8 and 9):
Pbk Cdr Ov^apknc
Valub
SCAU
'■'^
I
Dearborn
9-
3
Army Beta
Pintner
4
Myers
5
Preuey
27.
Formation of a Criterion
fig. &. Showing 27 paroant ovarlapplng of Orad* III
OT*r Orad* IV in th« scores of ?rasMy Prlmtr ilo«l«.
fig, 6. Showing ai p«ro«nt overlapping of Grado Itl
ov«r Qrala IV In th« soorcs of Uj9TS Uvntal Uaasura.
Pig. 7. Showing 15.2 pvrcent overlapping of Grad* III
ovar Orada IV in tha scoras of Pintnar's Non-languaga
Taata.
Fig. 8. Showing 12. parcant ovarlapplng of Ora4a III
over Grada IV in the scores of Army Beta Examination.
Fig. 9. Showing 9.8 percent overlapping of Grade III
over Qrada IV in the scores of Warbom Aroup Tests
of Intalllganoe.
38
Non- Verbal Intelligence Tests for Use in China
The Dearborn Group Tests of Intelligence, the Army Beta
Examination, and the Pintner Non-language Tests, which were
found better than the others according to the slope method, also
stand high here. However, tests should be ultimately weighted
according to the variabihties of their scores; the range and deviations
from the averages should be taken into consideration. The measure
of variability used in this study is Q or quartile-deviation. Q is
that distance on the base line of the normal cur\-e which includes
roughly half of the measure, when laid off on each side of the aver-
. a - 0.
age. It is computed by Q =
That is, e = half of the dis
tance between the 75 percentile and 25 percentile.
Q was computed for ages 8, 9, 10 of both boys and girls as shown
in Table IV. The sum of these Q's in the different scales is 62.1
Weighting of
THE Scales
According
TOQ
Age
Pressey
Pintaer
Myers
Beta
Dearborn
BOVS
8
9
"■5
8.6
II. 7
10. 5
17.1
13 5
5-4
51
5-6
13.3
11.5
11.6
39-
23.0
23.0
8
9
14-8
8.6
6.9
19.2
28.8
ig.o
4.0
5.2
4-5
9.8
10.2
12.9
26.0
16.0
23-0
Total
Abbrev. Total . . .
62.1
6
104. 1
29.8
3
693
7
150.0
15
Multiplier ....
I
I
^
=
I
Resulting Weight
6
lO
6
14
'5
for Pressey Primer Scale, 104. 1 for Pintner Non-language Tests,
29.8 for Myers Mental Measure, 69.3 for the Army Beta Examina-
tion, and 150 fiJr the Dearborn Group Tests of Intelligence, These
Formaiion of a Criterion 39
numbers were then reduced, for convenience, to 6, 10, 3, 7, 15
respectively for five different scales. These values of the Q's show
that, if the raw scores of the different scales were summed up just
as they appear, the Dearborn Scale with its Q of 15 would have five
times as great weight as the Myers Scale with its Q of 3; it would
give Army Beta Scale almost the same weight as the Pressey Scale,
and these weights did not appear to correspond with the real value
of the tests. After several trial weightings, it was finally decided
to multiply the Myers Scale scores and the Army Beta scores by 2,
and the other scores by i . The results showed that they were thus
weighted fairly as their value corresponded roughly with the results
previously found by the overlapping method. Army Beta Exam-
ination was found to be one of the best scales, and it should have at
least as much weight as the Dearborn Scale. Although the Myers
Scale was not considered one of the best, it was fair to assume that
it should carry weight equal to the Pressey Scale. The following
table indicates the weights:
Dearborn i
Army Beta . .
Pressey
After the raw scores of the different scales were weighted, they
were summed up to get a composite score for each individual.
C, METHOD OF SELECTION OF THE FINAL CRITERION
After consideration of the various facts known about the sub-
jects, and inspection of their correlations with the composite test
score, the following composite (termed school criterion) of age,
school marks, teachers' estimates and school progress was tried.
As previously explained {page 29) numerical values were assigned
to different ages, so that a young child in an advanced grade receives
more credit than an older child in the same grade (see Table I).
Likewise, teachers' estimates of intelligence and school marks (see
pages29,3l-32),both of which were registered in letters, were trans-
muted into numbers. Numerical values were also assigned to the
40
Non- Verbal InteUigence Tests for Use in China
grades reached (see pages 32, 33). To illustrate the procedure, ten
cases are shown in Table V. Pupil A received a credit of 9 for her
age, 10 for her school marks, 12 for teachers' estimates of her intelli-
gence and 20 for her school progress. Similarly, pupil J received a
credit of 3 for his age, 5 for his school marks, 4 for his teachers'
estimates of his intelligence and for his school progress. In the
same way, credits were assigned to all the elements of the school
criterion for each pupil.
This seemed a reasonable weighting of the facts. Their correla-
tions with the composite test score were .71 for the boys and .91
for the girls, with an average of .81. Since we may assume that
the composite average of all the intelligence tests is a fairly true
measure of intelligence, these high correlations are evidence that
the school criterion is reasonable.
TABLE V
Data fok Scaocn. Ckitbrion (10 Selected Pupils)
A
B
C
D
E
F
C
H
1
*"
Chron. As.- , ,
Om",
"".
6^.
"■"■
.= y«
syrs
!^:
Syhl
Smo.
•"•
mi
Crrfit
.
>
>
8
8
a
s
7
•
i
Murks
MBks
-
..
"
B
B-t-
C
c
c
»
<=
c«m . . .
,.
>
,.
7
8
s
s
S
.
s
T™c1^.i.-
*
A
A
B +
B+
B
A-j-
A-l-
A
c
c+
c+
c
D +
D
C-f
c
c
Cdl, , .
..
..
,.
'
1=,
6
6
»
1
4
School
Grade- -
,..
4B1
4B,
4*.
46.
iB^
lb.
3B>
sB.
IB.
Citdii - - ,
■0
■o
■0
,0
™
«
"
-
.
ToWl
"
48
"
M
48
-.
r6
M
"
,.
Formation of a Criterion 41
However, a combination of the composite test score and the
school criterion might be still more useful. So, we combine them
into what we have called the Final Criterion. The S.D. for the
school criterion is 8 and that for the test score 12. It seems desir-
able to give each equal weight; therefore, the raw score of school
criterion were multiplied by 3 and the composite test score by 2.
This may be expressed by an equation as follows:
Final Criterion = (3 X School Criterion) + (2 X Weighted Test
Scores)
that is.
Final Criterion = [3X(Age+SchoolMarks-f Teachers' Estimates
+School Progress)] + [2 X (Dearborn + Pressey + 2 X Army Beta
+ 2 X Myers)]
For further explanation, see Table VI, which contains data for
ten pupils. Similarly, the final criterion was calculated for all the
pupils.
The Final Criterion is the standard used to select the best test
elements from the five intelligence scales for development into a
valid and reliable measure of intelligence for use in China. The
success of the work is, therefore, largely dependent upon the validity
and reliability of the criterion. Now the questions arise: Is the final
criterion valid and reliable? Are not the elements which made up
the final criterion repeating themselves? Is it right to include the
scores of tests in the final criterion and then use the combination
with the tests elements? It is admitted here that the criterion ele-
ments do overlap and there is no line of demarcation to differentiate
them. For instance, when a teacher estimates the general intelli-
gence of a child she considers his age and school achievement; and
school progress involves many factors such as age, school marks and
teachers' estimates. But there is no doubt that every element
measures something which is somewhat different from that which the
other elements measure, that no two of them measure exactly the
same traits. Furthermore, all the criterion elements, as explained
before, are in some degree measures of general intelligence and
each of the five scales has been reported to be a reliable intelligence
test. A combination of all these factors certainly should make the
final criterion reliable. Finally it must be kept in mind that the
43 Non-Verbal Intelligence Tests for Use in China
purpose of this study is to select the best test elements from the iivt
scales; and the criterion required is simply a definite constant stand-
ard. It really makes little difference whether the criterion ele-
ments to some extent overlap in their functions, for the final cri-
terion will be applied uniformly to the tests elements.
TABLE VI
Data fob Ten Selected Pupils for Calculation of thk Final^Ckitebion
Pupils
School Criterion
Age
School marks
Teachera' estimates
School progress
■ School Criterion Total
3 X School Criterion Total . . . .
Test Score
Presaey
Myers
Army Beta
Deart>om
Teat Total
PreBBey+
Pintner4-
aX Myer+
3XBeta+
Dearborn
. aX Test Total (abbreviated) . . .
Final Criterion
[(3 X School Criterion Total) + (;
TestTotaDl
CHAPTER IV
SELECTION OF TEST ELEMENTS
A. SELECTION OF TEST ELEMENTS BY CORRELATION METHOD
The ultimate aim of this .study is to select the best single tests
from the five intelligence scales, with the hope that they may con-
stitute a non-verbal intelligence scale for use in China. Chapter III
has discussed the "final criterion." The present task is to utilize
it as a basis for selection. For this purpose the correlations of every
single test of the five scales with the final criterion have been worked
out. It is assumed that any test element which correlates highly
with the final criterion is good. This, however, does not mean that
all the tests which correlate highly with the final criterion should
be adopted in the Chinese Scale. A high correlation between two
tests may be because they measure the same traits; and the corre-
lations so obtained are simply self-correlations. A good intelligence
scale should measure a combination of diflTerent traits, so the test
elements in the scale should measure as many different mental traits
as possible. Consequently, the ultimate object should be to select
those test elements which individually correlate highly with the
final criterion but which correlate but little with each other. The
writer has adopted r = .80 as a standard. It is aimed to discover a
group of test elements from the five scales, which, combined to-
gether, will give a correlation above .80 with the final criterion.
Scattei^rams were prepared charting every single test element
against the final criterion , an inspection of which showed the follow-
ing to have high correlations.
Pressey Primer Scale, test 4
Pintner Non-language Tests, tests 2 and 5
Myers Mental Measure, test 2
Army Beta Examination, tests 4, 5 and 6
Dearborn Group Tests for Intelligence, Series I:
General Examination i, test 17
General Examination 2, test 4
General Examination 3, test i
44 Non-Verbal Intelligence Tests for Use in China
After the scattergrams were inspected, the next step was to
determine roughly the correlations of all the tests. The formula
used isSheppard's.r - cos irt/ where I/is the"percentageofunIiked
signed pairs,"* and
U -
im\
n = the number of cases
/ = the number of + + and paits
« = the number of + — pairs
d = the number of oo, o+ and o— pairs
All the correlations which, by this method, were found to be above
.60 were computed also by the product-moment method. Table
VII shows the results as found -
Among these tests, the two types which appear the most promis-
ing are the completion tests and learning tests. Other workers in
this field find similar results. Each of these was used by the makers
of three of the five scales tried out and was included in their final
forms because of its value as an independent measure of intelligence.
Consequently, these two types of tests have been made the core of
the proposed Chinese scale. The other elements to be chosen should
not correlate highly with these two combined, since any other test
which does correlate highly with them probably measures the same
traits and, consequently, would add little to the measurement. The
learning and completion tests selected were those from Army Beta
(tests 4 and 6) rather than from the others, because this scale has
had a wider use and more searching criticism than any of the others.
With these as a basic group, the correlations with every other test
in all the five scales were made. Table VIII shows the results.
However, all the completion and learning tests show high corre-
lations with the final criterion and certain of the correlations of
the tests against Beta 4 and 6 combined give promise. But to
'«plore further to see whether a better basal combination could
> Thomdike, E. L.: An IiUrodtution lo Ikt Theory of Mental and Social Measurt-
mmii. DD. 170-71.
Selection of Test Elements 45
TABLE VII
CoKKBLATtoNS OF Individual Tests with Final Criterion by Sheppard's
Product-Moment Methods
Pressey i .
Pressey 3 ,
Pintner 3
Pintner 6 . , . .
Army 3 . . , .
Army 5 . . . .
Dearborn I 7 .
Dearborn I 8 .
Dearborn 1 9 .
Dearborn I 10 ,
Dearborn In.
Dearborn 1 13 .
Dearborn I 15 .
Dearborn I 16 .
Dearborn I 17 .
Dearborn II 1 .
Dearborn II 3 .
Dearborn II 3 .
Dearborn II 4
Dearborn II s .
Dearborn II 6 ,
Dearborn II 7 .
Dearborn III 1
Dearborn III 3
Dearborn III 3
Dearborn III 4
Non-Verbal InUlligence Tests for Use in China
TABLE VIII
Tests
No. OF
Cases
(SheJpard)
Pressey I
346
■ 45
Presseya
338
51
Pressey 3
341
45
Pressey 4
344
61
Pintner i
31a
48
Pintner 2
313
43
Pintner 3
313
61
Pintner 4
313
5'
Pmtner 5
31a
48
Pintner 6
3"
34
Myers 1
297
45
Myers a
3>4
61
Myers 3
319
51
Myers 4
392
56
Army I
371
66
Army 2
370
43
Arrays
370
51
Arrays
368
58
Army?
374
33
Dearborn I 7
334
45
Dearborn I 8
334
36
Dearborn I 9
335
19
Dearborn I 10
333
36
I>earborn I 11
331
33
Dearborn I u
336
19
Dearborn I 15
329
31
Dearborn I t6
331
56
Dearborn I 17
397
48
Dearborn 11 t
341
45
Dearborn II 3
34a
51
Dearborn II 3
340
31
Dearborn II 4
340
37
Dearborn II 5
343
34
Dearborn II 6
303
28
Dearborn II 7
305
37
Dearborn III i
a66
48
Dearborn III 3
341
45
Dearborn III 4
303
51
Selection of Test Elements 47
be made, correlations were worked out between the criterion
and various other combinations of tests. The results are shown in
Table IX.
No. OF Cases r (
EARSON)
ORBKLATinK BTW EK
Final Criterion and PreiBey
337
58
Final Criterion and Pintner
235
78
Final Criterion and Myers
235
65
Final Criterion and Army
a35
75
Final Criterion and Dearborn
235
80
Final Criterion and Dearborn 1
235
69
Final Criterion and Dearborn 11
235
76
Final Criterion and Dearborn III
235
67
Final Criterion and Dearborn I, i-6
330
Final Criterion and Dearborn 1, 7-15
334
63
Final Criterion and Pintner 2+3
236
73
Final Criterion and Army 3, 4, 5, 6
232
714
234
711
Final Criterion and Army 3, 4, 5, 6+
Presaey,2,4
*34
696
Final Criterion and Pressey, 2, 4
233
47
Final Criterion and Army 4, 6+ Pressey 2, 4
235
56
Final Criterion and Pintner 2. 3, Army 6
233
815
Army 3. 4, 5. 6, and Pressey 3, 4
235
38
Pressey 4 and Army 5
49
Pmtner 2 and Pintner 3
337
73
Pintner 2 and Army 6
313
28
Pintner 3 and Army 6
313
37
Pintner 3 and Dearborn 1
313
614
Pintner 3 and Dearborn II
315
551
Pintner 3 and Dearborn IH
3.6
57
Here are shown significant results, establishing the fact that a
better combination than Army Beta 4 and 6 is Pintner's 2 and 3 and
Army Beta 6, its correlation with final criterion being .815. They
are, however, still learning and completion tests. Tests 2 and 3 of
Pintner's scale both correlate low with test 6 of the Army Beta
(.28 and .37). These two types of tests really measure different
48 Non- Verbal ItUeUigeTtce Tests for Use in China
traits. Pintner 2 and 3 are both included rather than either one
alone because they really form a single test.*
These three tests finally selected were now termed "The Basic
Tests." They take only ten minutes to perform. The other tests
to be included with these must be of different type. This could be
found out by correlating the individual tests with the basic tests.
The correlations between the basic tests and the individual tests
were computed and compared with their correlations with the final
criterion, as shown in Table X. The results indicated that the other
testa were fairly good as independent measures because their corre-
' A second sigaificatit correlatioB showo In Table IX b that or the final criterion and
the entire DMrlmni te«t (.80). However, this should not be interpreted ai proof that
tbe Dearborn Group TeMs are the best of the five scales. They take more tlian two
hours to finish, and consequently the high correlation may be due to practice effect.
Any test if prolonged might r^ult in a fairly high correlation. No single test in the
E>eaTboni battery, however, correlates higher than .58 (see Table VI) with the final
criterion.
It is worth noting (see Table VIII) that when tests i and b are eliminated from
Dearborn Group Examination I, little change in the total correlation is made; also
that Group Examinations I, 11 and 111 each has almost the same value as the other,
the correlations being .61}, .76, .67 respectively. Each part of the Dearborn Scale,
when used as a single measure of intelligence, is better than the Pressey Scale and just
as good as the Myers Scale. Each of the three parts of the Dearborn Scale also corre-
lates fairly high with the Pintner Scale, which also Indicate* the value of each part as
a meaBuie of intelligence.
As a whole test, the Pressey Primer Scale seems to be the poorest of the five scales
used in this experiment. Its correlation with the final criterion is only .58. Tests a
and 4 were found better than the other two tests, but their correlation with the final
criterion was only .47. The correlations were not raised when combined with Tests
4 and 6 or Tests 3. 4, 5 and 6 of the Army Beta Examination.
According to this Invegtigation, Myers Mental Measure was better than the Pressey
Primer Scale, hut it was inferior to the other three scales. The individual tests, how-
ever, all showed fairly high correlations with the final criterion.
Anny Beta Examination as a whole had a correlation of .75 with the final criterion,
which was good. When only tbe combined scores of Tests 3, 4. s and 6 were corre-
lated with the final criterion, the result was r •■ .714; and the correlation between
Tests 4 and 6 elone and the final criterion gave just as good Ksult (.711). This proved
that Testa 4 and 6 were the best test elements for our purpose in the Army Beta
Examination. The conclusion was further confirmed when these two tests, combined
with tesu from other scales, failed to raise the correlation higher than .711 C»ee Table
IX).
Other things as well as the correiatioo being taken into account, Pintner's Non-
language Scale seemed to give the beet measure of intelligence, because (a) It corre-
lated highly (.78) with final criterion, (»} it did not take a long time to give, and (<:) it
was easy to score. The individual tests also correlated highly with the final criterion .
Pintner Tests a and 3 with Test 6 of the Army Beta stood highest among all the
tndividual tests in the five icalee.
Selection of Test Elements 49
lations with the basic tests were in general lower than with the
criterion . Test 4 of the Pressey scale and Test 7 of Dearborn Exam-
ination I were the best, as their correlations both were below .30.
TABLE X
COKRELATIONS BETWEEN THE INDIVIDUAL TESTS AND THE BASIC TESTS
(PlNTNBS 3, 3 AND BETA 6)
Name of Test No. of Cases (Peareoh) '
Dearborn 1 — 7 aS?
Dearborn I — 10 291
Dearborn I — 1 1 289
Dearborn !I — 3 293
Dearborn II — 4 394
E>earborn II — 7 395
Dearborn III — i 393
Dearborn III — 3 293 .36
Army 4 313 .58
Army 5 309
Myers I 353
Myers 2 378
Myers 3 280
Myers 4 363
Pressey 4 391
However, as there are other factors to be considered in the selec-
tion of the tests besides the correlations, the tests to be combined
with the basic tests were not finally selected pending further inves-
tigation.
B. SELECTION OF TESTS BY RATING
The rating method is not so accurate as the correlation method,
but when the results of the latter are known the former can be wisely
used to help in the selection of tests. Sometimes the judgments of
specialists are as valuable as objective computation.
On October 20, 1921, the members of the psychology seminar at
Teachers College, who are instructors and graduate students in the
field of measurement, were asked to rate the individual tests in the
different scales. A copy of the test material was distributed to each
member and the instructions for administering the tests were read
go Non- Verbal Intelligence Tests for Use in China
to them. They were then asked to rate two characteristics of each
individual test as follows:
a Can many alternative forms be prepared for the test? Assign
a value of o to lo or more for alternative forms, o value for no
alternative forms, and the others in proportion.
6 Is success in doing the test due to verbal instruction? Assign
a value of lo if the success in doing the test is entirely inde-
pendent from verbal instruction, a value of 5 if the success is
fairly due to the verbal instruction, value if the success is
entirely dependent upon the verbal instruction, and other
values in proportion.
The results of the rating are shown in Table XI.
It was assumed that the instructors and the writer, being familiar
with tests and their making, would be better judges than the mem-
bers of the class, and therefore their Judgments were weighted four
times as heavily as those of the students.
A question of prime importance in the c£ise of any test is whether
or not it is applicable to Chinese. Consequently, ten Chinese
advanced graduate students of education were asked to rate the
tests in the same way as the seminar students. The test material
was given to each of the judges and the instructions for giving the
tests were read and explained to them. They were jisked, "Is this
test applicable to Chinese? Assign a value of 10 if it can be applied
to Chinese very easily, a value of 5 if it can be applied with some
difficulty and o value if it cannot be applied to Chinese at all." The
results of these ratings were not so satisfactory as anticipated.
The writer finally assumed the responsibility, although he was
guided by the ratings of other Chinese judges, to rate the individual
tests. The results are shown in Table XII (page 52).
Both of the ratings of the two groups of judges indicate different
values for the different tests. The three best tests according to this
investigation were Tests 4 and 5 of the Army Beta Examination and
Test 4 of the Pressey Primer Scale, This finding was still not co-
sidered final and a further investigation was made.
C. SELECTION OF TESTS BY PARTIAL CORRELATION
To be certain that the other tests to be included in the Chinese
Selection of Test Elements 51
scale should be different in their nature from the basic tests, all the
completion and learning elements should be eliminated from the
TABLE XI
Ratings op'the Individual Tests by Coupbtent Judges
J"JKes
3:
J
%
S
<
^
B
C
D
E
,
G
fl
/
J
<
3-
Army 4
A
lO
7o
i^
To
^
i^
T,
To
8
To
To
S
^
To
5
~T
9-7
V
7
s
8
8
8
5
10
9
8
«
5
10
10
8
5
.8
7.6
Army 5
A
[0
10
10
10
10
10
10
7
"
10
10
10
10
(O
.7
9 9
V
4
8
5
5
8
3
7
6
10
2
8
8
5
7
.1
5-9
Dear. 1— 7
A
4
7
6
4
2
10
10
8
8
10
9
5
5
2
■5
4 9
V
2
I
a
I
5
«
7
5
.8
I 4
Dear. I- 10
A
7
i
10
5
6
10
6
6
10
8
3
5
.8
5 3
V
I
2
1
8
3
2
1
3
2
.0
1.5
Dear. I— 11
A
I
6
3
2
4
10
10
2
3
4
3
I
3
.6
2.8
V
'
6
8
i
.9
-7
Dear. 11— 2
A
2
6
6
8
10
10
JO
2
10
10
2
3
5
3
>-S
6-7
V
I
5
2
4
I
9
5
3
.8
2 2
Dear. 11— 4
A
10
8
10
9
10
10
lo
3
8
10
3
10
9
■5
8.5
V
I
3
1
6
5
10
5
.0
1.7
Dear. 1 1 -7
A
i
5
10
5
2
10
10
9
8
2
2
3
[0
3
.6
5 3
V
2
3
1
a
I
I
10
3
-5
1 7
Dear. Ill— I
A
i
8
10
6
6
10
9
10
10
5
10
5
-5
7-*
V
S
3
I
3
3
7
3
10
5
2
3
■4
31
Dear. Ill— 3
A
I
4
3
z
4
10
1
5
5
^
2
10
2
9
2 9
V
3
J
4
10
.0
1.3
Myers I
A
5
9
10
8
6
10
lO
10
10
10
10
10
8
.8
a. 3
V
D
1
6
6
8
■4
-8
Myeraa
A
5
8
10
8
4
10
10
10
5
10
10
10
10
5
,0
8.6
V
4
8
7
7
2
5
5
10
4
7
3
5
6
I 5
63
Myers 3
A
4
9
10
8
10
10
10
4
10
10
10
10
10
6
.0
8.3
V
i
i
7
6
1
10
'
4
5
2
■3
3-5
Myers 4
A
4
&
10
7
8
to
ID
to
6
10
2
10
10
6
.4
S.o
V
I
2
6
3
8
10
3
3
■3
3-3
Pressey4
A
j8
-i
8
J.
9
6
-I
1
4
-5
10
A
&
10
-3
^
5
-3.
4
±
5
.8
8,9
58
52 Non- Verbal Intelligence Tests for Use in China
TABLE XII
Individual Tests Rated re Application to Chinese
Tesie Application
Aimy Beta 4
Army Beta 5
Dearborn I — 7
Dearborn I — 10
Dearborn 1 — 11
Dearborn II — 2
Dearborn n — 4
Dearborn II — 7
Dearborn III — 1
Dearborn III — 3
Myers 3 9.0
Myen4
PreaBey4
Other tests. This is done by the method of [>artial correlation. The
formula ' used is:
f _ Tit - (r») (r»)
V(,-r*..)(i-rS,)
fix - The individual tests and the final criterion,
rn > The individual tests and the basic tests,
ftf - The basic tests and the tinal criterion.
The results are shown in Table XIII. Test 4 of Pressey Primer Scale
has distinctly high partial correlation (.60) with the criterion after
the learning and completion elements are [)artialed out. As to the
other tests, the [)artial correlations vary from —.25 to +.43.
D. SBLBCTION OF TESTS BY A COMPOSITB METHOD
The rating method and the partial correlation method both indi-
cated the general value of the different teats, but each by itself could
not be used as a basis for the selection of the tests. The best way
' For a coirplete discueaion on the partial correlation method, see Thomdike, B. L.:
Thtory of Menial and Social MiasnrimenU, p. iSa; and Kelley. T. L.: "Table to FacUi-
tate the CalculatloD of Partial Coefficient of Correlation and Regression Equationa',"
BnlUtin of Univtrsity of Taos. igiO, No. 37.
Selection of Test Elements 53
was to use a combination of all the available methods together with
a consideration of al! the other factors. This could be accomplished
by first summing up the results obtained from the different methods
and then selecting the best tests according to the composite results,
which are shown in Table XIV.
TABLE XIII
TBB Individual Tests with the Final Criterion with the
Elements of the Basic Tests Eliminated (r n.a C«.umn)
rn - .815 (r Final Crii
Dearborn I — 7
Dearborn I — 10
Dearborn I — 11
Dearborn II — 3
Dearborn II — 4
Dearborn II — 7
Dearborn III — I
Dearborn III — 3
Array 4 . , .
Army 5 . .
Myers 3 ...
Myers 4 . . .
Presaey 4 . , .
■54
35
In comparison with the other factors, more weight should be
attached to the partial correlations. Consequently, they were
multiplied by 50 so as to equalize the values of the "alternative
forms," "verbal instruction," and "application to Chinese." The
last column is the summing up of the four values. A review of the
combined results shows that the following tests have the highest
values:
Pieasey Scale, test 4 53 . 70
Army Beta, test s 4333
54
Non-Verbal Intelligence Tests far Use in China
Evidently, Test 4 of the Pressey Scale and Test 5 of the Army Beta
Examination are the best among all the individual tests of the five
scales to add to the basic tests. These two were consequently
definitely selected to be included in the proposed Chinese scale.'
With the selection of Test 4 of the Pressey Scale and Test 5 of the
Army Beta Examination, to be included in the Chinese scale, it
TABLE XIV
CouBiNBD Value of the Indjvidual Tests as Detekhined by Ratings and
Partial r Method
Tests
Alterna-
tive
Forms
Instruc-
tion
Applica-
tion
PartUI
rXso
Com-
bined
Value
Dearborn I— ;
Dearborn 1—
Dearborn I—
Dearborn 11-
Dearborn 11-
Dearborn 11—
Dearborn III-
Dearbom Ill-
Army 4 . .
Army 5 . .
Myers I . .
Myers 3 . .
Myeraa ■ ■
Myers 4 ■ .
Pressey 4 .
4
7
-3
4
S
6
8
5
7
9
9
8
8
8
8
8
9
3
3
7
5
6
3
3
5
4
5
7
2
7
7
3
6
9
8
8
5
3
8
6
7
7
8
8
7
9
7
9
8
7
8
9
8
9
7
3
5
8
6
03
9
8
8
19
18
»3
- 3
19
9
'4
30
5
5
5
5
33
30
36
30
3a
4>
24
23
43
36
37
41
7
53
32
20
30
40
90
60
40
40
33
50
70
40
60
70
was necessary to consider the character of these two tests in greater
detail. A study of their correlations with other elements showed
the following results:
' Test I of the Dearborn Group Examination III would likewise have been included,
had it not closely resembled the basic tests. Other important objections to the E>ear-
boRi tests were: first, the value of the test might be due to practice effect; second, the
test, comprising thre« pages of pictures, was too expensive.
Selection of Test Elements 55
COKRELATION BBtWBBN: COORKLAIIOM
Presaey 4 and Criterion 54
Army 5 and Criterion 52
PresBey 4 and Basic Tests 25
Army 5 and Basic Tests 38
Presaey 4 and Army 5 49
Thus Pressey 4 and Army Beta 5 both correlate fairly high with
the final criterion, and rather low with the basic tests. On the other
hand, their correlations with each other were not high. This proved
that the two testswere good measures of intelligence, each measuring
traits different from those of the basic tests and from each other.
Because of these special qualities and characteristics of the Pressey
4 and Army Beta 5, they were chosen, along with the basic tests,
to form the proposed Chinese intelligence examination.
B. WEIGHTING BY REGRESSION EQUATION
It has been found that both Army 5 and Pressey 4 should be
included in the proposed Chinese examination. The question then
arises as to the amount of weight to be attached to the two tests
and the basic tests. To solve this problem the regression equation
was used. The regression equation follows:
- ., Vi - H„ Vi - r«„.,
Vi
- •'ua
= 0, Vi - r»„ Vi - r'a.t
Vi
VT
- r^w M
- I. Vi - r'u Vi - ,'„.,
- r'„„
- «, Vi - r'„ Vi - ,■„,,
Vi
- r"u..i
''isM *= rit.t Aij.t, M.t — Bis,,
ra.u - ri... A„.., „., - B„.,
>, »..
fH-Jl = ru.» Ais,», M.» - Bu,;
1, a.i
Non- Verbal Intelligence Tests for Use in China
TABLE XV
Data for Caixulation of Rbgrkssion Equation
• Criterion, a ■• Basic tests. 3 ■■ Army Beta 5. 4 ~ Presaey 4.
.
7
3
4
.815
■ y
.38
■ 54
■ 25
■49
45.3
3S
8.1
5-6
In Table XV the figure "i" stands for criterion; "2" for basic
tests; "3" for Arma Beta 5; "4" for Pressey 4. These correlations
were substituted in the above regression equation and the following
result was obtained:
21.10
6.60 ^ 4.26
X, =. 1. 135 X, + 0.578 X» + 2.387 X,
(or)
According to the result of the regression equation, the different
tests should be weighted as follows: (a) multiplying the basic testa
scorebyi.14; (6) multiplying Army Beta 5 by .58; (c) multiplying
Pressey 4 by 2.39. tn consideration of the general impression of
the tests, however, a conservative procedure was adopted. In
giving final weights to the tests, the scores of the Basic Tests and
of Army 5 were left unchanged, while the score of Pressey 4
was multiplied by 2. The weighted composite scores so obtained
(called Composite A) were then correlated with the final criterion,
the correlation found being .812. This result was very satisfactory,
since it exceeds the goal of r - .80. In order to find out whether
the weighting had raised the correlation or not, the raw composite
scores of the Basic Tests, Pressey 4, and Army Beta g (called Com-
posite B), were also correlated with the final criterion, the correla-
tion found being .789. Thb showed that the weighting had raised
the correlation slightly.
Seleclion of Test Elements 57
It should be kept in mind that the tests chosen are not based upon
an empirical method of a single statistical computation, but upon
all the possible available methods, such as correlation, rating by
specialists, partial correlation, regression equation. The test ele-
ments finally chosen from the five scales for the proposed Chinese
Non-verbal Intelligence Examination are:
Test 2 of Pintner Non-language Tests
Test 3 of Pintner Non-language Tests
Test 5 of Army Beta Examination
Test 6 of Army Beta Examination
Test 4 of Pressey Primer Scale
CHAPTER V
RE-TESTING
The tests to be included in the proposed Chinese intelligence
examination having been tentatively selected, the next step was to
determine their reliability and practicability. This could be done
by giving the above tests to the same children and calculating the
correlations of their scores with the final criterion. If the tests are
reliable and practicable, they should correlate highly with the old
criterion. An effort was made, therefore, to secure the same sub-
jects who the year before had taken all the tests. Some of them
had moved out of the district or gone to a higher school and it was
impossible to locate all of them, but finally 190 children (from the
earlier total of 401) were secured.
The re-testing was done from November 28 to 30, 1921, in the
same room where the children were formerly tested. A uniform
environment, which was similar to that at the first testing, was
maintained throughout the examination. The same principal
and the same teacher assisted in timing and policing. As in the first
testing, 28 children were tested at a time; the children being sufB-
ciently separated from each other, there was no opportunity for
copying. The papers of three children, who continued working
after the "stop" signal had been given, were discarded for the com-
putation of the results, leaving papers for 187 children.
The directions for giving and scoring the tests were the same as
those of the year before, with the exception of a slight modification
in introduction (see Chapter VII for a complete record of the tests).
Preceding the testing, four boys and five girls were individually
interviewed. Each was questioned whether he could recall anything
concerning the tests of the year before. All of them indeed remem-
bered the occasion of the testing — they remembered "the good time
they had had with the Chinese teacher," but not one of them could
recall any of the tests. In other words, these boys and girls had
completely forgotten all about the first test, except for the vague
Re-testing 59
idea of having done it. It is possible that the actual performing of
the tests might recall the experience in previous testing, but in
young children of this age the likelihood of recalling the tests of
the year before seems so slight as to be immaterial. Consequently,
the process of re-testing these children cannot be said to be influ-
enced to any noticeable degree by repetition.
In the re-testing, the children appeared to enjoy their work.
There was no sign of fatigue; instead, they were very enthusiastic.
The writer obtained some interesting information, on the effects
of the tests upon the children, by mixing with them during the recess.
Joining in their play, he was constantly approached by them with
such remarks as, "Mister, play some more games with us." "When
will you come back again?", "Oh, I like to see the woman without
a nose, and the poor fish without an eye," "There's lots of fun in
making zeros and crosses."
The time consumed in testing was from 25 to 30 minutes. It is
important to note, in discussing the time necessary for this testing,
that none of the groups consumed more than 30 minutes in their
testing, nor less than 25 minutes. This, of course, does not include
the time taken in the distribution of test material nor for the pre-
liminary remarks by the examiner.
The method of scoring the tests was very simple. Stencils were
prepared in order to facilitate the work. With a small amount of
practice, test papers could be scored very rapidly, even at the rate
of a paper a minute.
B. STATISTICAL STUDY
The first step was to determine the general merits of the selected
tests, from now on known as "The proposed Chinese Non-verbal
Intelligence Examination." Tables of grade distribution and age
distribution were prepared (Tables XVI and XVII), and the
medians for the different grades and ages were calculated. The
medians found for the different grades were: Grade III, 10I.36;
Grade IV, 125.26; Grade V, 148.50. The result was encouraging
as it sliowed a fair improvement in central tendencies for the
different grades.
The median scores for the different ages were found as follows:
Age 8, 85;. age 9, 115; age 10, 128; age 11, 142; age 12, 148. The
Non- Verbal InteUigence Tests for Use in China
60
result was also encouraging. The medians for ages 1 1 and 12 were
close to each other, probably because the 12-year-old children in
these grades were duller than the average 12-year-old.
The last step was to find out how closely the test scores of the
selected tests corresponded with the old final criterion, which was
used as a standard for the measure of general intelligence. Ccmse-
quently, a scatteigram was made and the correlation found, by the
TABLE XVI
DisTBiBunoK or Rb-testinc Scores bv Gsades
Re- testing Scores
170-180 ,
160-170
190-160 . .
140-190 .
130-140 . ,
■30-130 . ,
tlO-130 ...
lOD-IIO ....
80- 90 ... ,
70- 80 . , .
go- 60 . . .
40- 50 - .
30- 40 -
JO- 30 ... .
10- 30 . . . .
Number of Cases
Median ....
product-moment method, to be .8768. The result was very satis-
factory. Theoretically the correlation between the selected tests
and the old criterion should be higher than the correlation between
Re-testing
TABLE XVII
Distribution of Re- testing Scores by Ages
Re-tKBting
Scores
Age 7
Ages
Age 9
Agero
Agerr
A^.a
Age 13
170-180
160-170
150-160
140-150
130-140
120-130
90-100
80- 90
70- 80
60- 70
50- 60
40- 50
30- 40
20- 30
10- 20
0- 10
3
6
7
5
9
•J
6
7
9
6
8
3
;.
No. of Cases
>
57
68
44
9
4
a
Median . .
85
"5
127-7
142.2
147-5
any of the five scales with the old final criterion. This was proven
true, as shown in the following:'
Correlations Between the Final Criterion and the Different Scales
Scales Cokrelation
The Selected Tests 88
Dearborn Group Tests . . , . , .80
Pintner Testa 78
Army Beta Examination 75
Myers Mental Measure 65
Pressey Primer Scale 58
' Tbe Arst correlation is not Btrictly comparable to the others since it was obtained
from the 187 cases of re-teetins while the others were from the more than 350 cases
in the first testing.
62 Non- Verbal Intelligence Tests for Use in China
Judging by the results of the correlation of the selected tests with
the old final criterion, by the comparatively short time to give the
tests, and by the deep interest displayed by the children indoing
the tests, together with their other merits, it seemsi fair to conclude
that the selected five tests which are included in the proposed Chi-
nese Non-verbal Intelligence Examination give better results than
any of the five scales used in this experiment.
CHAPTER VI
ALTERNATIVE FORMS AND STANDARDIZATION
A, ALTERNATIVE FORMS
Although the selected five tests are to be considered the best
among the five scales used in the experiment, they cannot be applied
to Chinese as satisfactorily as to American children. For instance
in tests 2, 3 and 5, Arabic figures are used in substitution and num-
ber-checking. Arabic figures are taught in all of the modern Chinese
schools, but the children who have not attended a modern school or
learned the Western arithmetic are wholly ignorant of the meaning
of them. Chinese children, not of better class families in some mod-
ern city such as Shanghai, also will be greatly handicapped in per-
forming test I. They can hardly be expected to draw the filament
of an electric bulb. They cannot place a postage stamp in its proper
American position on the envelope, nor complete the drawing of a
pistol, a bowling game, a phonograph or a tennis net, for these
objects are rare in China. The same may be said of Test 4, the tele-
phone, the gloves, the ABC, the American flag, the music scale, and
so on, are most likely unknown to 99 per cent of Chinese children.
Consequently, these tests cannot be applied unless alternative forms
are devised. As explained in previous chapters, alternative forms
have distinct advantages, besides their application to Chinese, such
as the prevention of coaching and the provision of material for
retesting.
In preparing the alternative forms, the criterions first adopted
were strictly observed. One point was especially emphasized;
namely, that the test material should be drawn from a social
environment common to all people and the test should measure only
those mental traits which every child has an equal opportunity to
develop. This means that the test material selected should not be
dependent upon any social or educational advantages. An attempt
also was made to bring all of the alternative forms to yield the same
64 Son- Verbal Intelligence Tests for Use in China
result. The writer, however, cannot claim credit for such an
achievement as yet, because the tests have not been tried out in
China.
The first step in preparing the alternative forms was to devise a
large number of test items. These were then submitted to ten
graduate students originally from different parts of China. They
were asked, "Is this common in your locality?" All the test items
which were marked "Not common" in any of the localities were
discarded . The selected test-elements were submitted to 3 Japanese,
2 Filipinos, 2 Indians, 2 Britons, and 2 Americans; and they were
asked the same question. "Is this common in your country?"
All those which were marked "Not common" were again discarded.
These remaining from this double sifting were finally gathered and
sorted into forms.
Different methods are required for placing the test-items in the
different individual tests. For tests 2, 3 and 5, the selection of the
symbols was made by the chance method of tossing coins. For tests
I and 4, the pictures were arranged, by the combined judgments of
three experts, according to their degree of difficulty, beginning with
the easiest, and ending with the most difficult ones. The best
method for arranging the tests in the order of their difficulty would
be one in which the tests are given to several hundred children, with
the answers scored either right or wrong, and the per cent of correct
answers obtained.
In Tests l and 5 of the Chinese non-verbal forms, the preliminary
demonstration is modihed. To be uniform with the other tests, the
marks and pictures to be used for the preliminary demonstration
are printed at the top of each sheet. This is an improvement also
because the use of a blackboard may be inconvenient or unfair.
The alternative forms thus devised cannot be claimed as the final
forms. They must yet be tried out upon a large number of children,
the norms for ages and grades must be computed, and the tests
scaled; but judging by the results of the experiment, there is every
reason to believe that the tests will prove reliable and useful.
B. STANDARDIZATION
The last step of scale construction is standardization — the obtain-
ing of norms and scaling of the tests. In order to do this for the
Alternative Forms and Standardization 65
proposed Chinese Non-verbal Intelligence Scale, it is necessary to
give it to a targe number of Chinese subjects, perhaps 5000.
The selected tests were only applied to about 200 pupils, very few
of whom were Chinese. The devised alternative forms, furthermore,
cannot be tried out in America. It was thus impossible to secure
any age or grade norms to be reported here or to scale the tests.
The final standardization must be done in China. However, the
technique may be briefly discussed here.
I . Norms
The purpose of mental measurement is to reveal individual and
group differences of intelligence. To perform such a function , norms
or standards of achievement for different ages and possibly grades
are required. We cannot, however, test all the Chinese people
between certain ages and compute the average achievement of each
age. This is unnecessary as well as impracticable. The obtaining
of reliable norms does not require the test of every child in the coun-
try, but it is essential that the subjects selected should be in random
sampling, representing the whole range of intelligence from a low
degree of moron to a high degree of genius. It is also essential that
the subjects should be representative of all types of social environ-
ment in different parts of the country.
Norms are more valuable when they are stable. When a norm is
stable, it indicates that the subjects are selected from random samp-
ting and the number of cases is sufficient. As a rule, the greater the
number of cases taken, the more stable are the norms; certainly a
norm can be claimed to be stable only when it reaches the point
where the addition of new cases does not materially alter the pre-
vious determination. The safest way to tell whether the norms are
stable or not is to average the scores of a varying number of cases
and watch the resulting fluctuations in the average. McCall states
that "when the addition of, say, 100 cases does not materially alter
the previously determined norm, the norm has stabilized." '
Norms for both age and grade should be worked out. However,
in China the age norms will be more important than the grade norms,
as the grades are not uniform in the schools. Care must be taken,
however, in obtaining ages to record the actual date of birth accord-
' McCall. W. A.: Htne to Meaiurt in Edncalio*. p. 315.
66 Non- Verbal Tnteiligence Tests for Use in China
ing to both old and new calendar, as many subjects, undoubtedly,
will follow the custom of rep>orting ages by years although they may
be born tn the end of the year.'
3. Scaling
After the tests have been applied to a large number of subjects
and the norms are obtained, scaling is comparatively an easy task.
There are numerous methods of scaling tests. For the Chinese scale,
the writer plans to adopt one or both of the two most commonly
used methods — an age scale and a percentile scale.
(a) Age scale: The construction of an age scale merely requires
the determination of stable norms. Given a norm for each age, any
pupil's test-score may be transmuted into a mental age and intelli-
gence quotient. Mental age is obtained from a comparison of the
subject's performances with the standard for normal children of the
same age. Let us suppose the subject tested is lo years of age. If
he can do as much as normal lo-year-old children do, the child
has a mental age of lo, which in this case is normal. If he goes as
far as normal 8-year-old children go, his mental age is 8. In this
case, he is subnormal. In like manner a mental defective lo years
old may have only a mental age of five, and a genius of the same age
may have a mental age of 13 or 14.
The intelligence quotient, often designated as I Q, is the ratio of
mental age to chronological age. It is a valid expression of intelli-
gence. On this basis of the Stanford Revision of the Binet Scale,
Terman * su^ests this classification of intelligence quotients:
1 Q CUSSIFICATION
Above 140 "Near" genius or genius
130-140 Very superior in intelligence
iio-iw> Superior intelligence
90-110 Normal, or average intelligence
8l>- 90 Dullness
70- 80 Border-line deficiency
Below Feeble- mi ndedness
' According to the old custom in China, which still prevails in many portions of the
country, age is reckoned in years, according to the calendar. For example, a man
whose 25th birthday cinneH in Dpcemher would be L-onsidtrtd as already 15 years of
age in the |irecedin)j January. 1'his may be explained aFi re»ulting from the literal
tran.-Jation of Chinese into English. In the Chlneae languaKe, aee or "flui" is expressed
in the phrase '"in the asth year," whereas in America this would be translated as "ij
yeara old."
' Terman. L. M.: The Measurtment <■/ InltlUiienct. p. 79.
Alternative Forms and Standardization 67
(p) Percentile scale: The tecfinique of percentile scale construc-
tion is described in detail by Pintner.' After the test papers have
been scored, a distribution table for each test is made. The per-
centiles are then calculated for each test counting usually from the
lower end of the table. The 25-percentile or Qi is that score which
is found by counting one-fourth of the score. The 75-percentile is
found by counting three-fourths of the scores. Similarly, the 10-
percentile is found by counting one-tenth of the score, the 20-per-
centile by counting one-fifth of the scores and similarly for any other
percentiles. After the percentiles are calculated, the percentile table
for each test should be prepared. To get the mental index of any
individual his percentile placement for each test is found by com-
paring his score with those found in the table, and then the median
of these various placements is found. Similariy the mental index
for the class, for the grade, and for the entire school can be found.
For purposes of rough classification, Pintner has adopted the follow-
ing scheme:
PsadHTD.! CUSSIrlCATIOH
84 — 100 Very bright
7a— 83 Bright
39 — 71 Average
22 — 38 , Backward
a— 21 Dull
' Pintner, R.: Tht Mtnlal Siirwy, p. 38, If.
CHAPTER VII
THE CHINESE NON-VERBAL TESTS'
A, THE NATURE OF THE TESTS
The measurement of intelligence has recently become widespread
in America. It has been proved very helpful in solving many admin-
istrative problems. With the hope of facilitating Chinese educa-
tional work, these tests are therefore introduced.
The tests were scientifically constructed for the measurement of
mental ability. They are applicable to a targe number of children
at a time, who are in the Citizens' Schools or Higher Primary
Schools. There are four forms, alt of equal value. It is advisable
to use different forms in various grades, so as to prevent coaching.
The period of testing does not exceed thirty minutes. It will enable
the teacher or school administrator to measure the mental ability
of pupils in groups for the following purposes:
1. Classification. The object of classification is to divide into
homogeneous groups the pupils whose needs are similar, in order
that work can be more exactly adapted to them. With the applica-
tion of these tests, a teacher can scientifically determine the mental
ability of his pupils in a rapid and accurate manner.
2. Promotion. The variability in the ability to learn among
children of any grade is great, and their progress is not at an equal
rate. It is obviously unwise to attempt to force all of them to keep
the same pace in their class work at one time. The bright pupils
therefore should be promoted as fast as their ability permits them
to at)sorb their work, or their courses of study should be enriched;
while slow ones may be given more time or requirements upon them
may be reduced to the minimum essentials.
3. Provision for the Backward. These tests may give a valuable
indication of the probable causes of difficulty with troublesome back-
ward children. Their restlessness, incorrigibility, and lack of school
progress may be due to a mentality unequal to the strain of ordinary
[^ ' The material in ttib chapter is ttaiulated from the Manual of Direction*.
The Chinese Non- Verbal Tests 69
school work. The tests may therefore indicate those who should be
segregated from the normal class and given special courses of study .
4. Vocational Guidance. These tests will not give prognosis of
litness for specific trades or professions except along broad lines; they
are selective. The test scores will show whether a child should be
encouraged to take a profession or do unskilled work. For instance,
it would be absurd to encourage a child whose test indicates feeble-
mindedness to study medicine or one with a genius to be a riksha
coolie.
Although these tests are primarily devised for the use of school
children, they will be of aid to the employer in making a hasty classi-
fication of his employees, especially the unskilled laborers; and will
aid the employee to find early the place for which he is best fitted.
B. INSTRUCTIONS TO EXAMINER
1. Any intelligent person who has a pleasing personality can
conduct a group examination with these non-verbal tests in a reason-
ably satisfactory manner and obtain fairly reliable results.
2. The examiner cannot give the examination satisfactorily until
he has thoroughly mastered the technique. He should try the tests
out on a smaller group of children than the one to be tested and then
memorize the procedure. However, he should always read the
directions from the manual.
3. The room for testing should be provided with chairs and desks.
It should be free from distracting noises within or without. No visi-
tor, school authority, or pupil should be permitted to enter or leave
the room during an examination unless the reason for so doing is
imperative. The school administration should so arrange the place
and time of testing that no one be permitted to weaken the value of
tests by distracting the attention of the children in any manner.
4. Children should used pencils rather than pens. Each child
should be provided with two pencils (with eraser) and the examiner
should always have on hand a supply of sharpened pencils to be used
if needed. If a child breaks his pencil, the examiner should supply
another with entire quietness and as little loss of time to the child
as is possible.
5. It is better for the examiner to remain at the front of the room
70 Non- Verbal InlelUgence Tests for Use in China
during the entire testing. He should ask the assistant, or ap|X>int
several pupils, to distribute the test papers.
6. Before the examination (begins, those to be tested should be
made to feel comfortable, and in an easy, contented but responsive
form of mind. Every effort should be made to make the testing
as informal and as much like a game as possible, yet precisicm and
exactness in obeying all the rules that have been worked out for
administering the tests is essential. Otherwise the results obtained
in different schools will be untrustworthy and not comparable.
7. In a given school, children should be tested in order from the
lower grade upward. So far as possible, the same examiner should
give all the examinations within the school.
8. The examiner should give the directions in a clear, energetic
voice. He should speak distinctly, at moderate speed and loud
enough to make his voice clearly audible to all the pupils in the room.
He must make sure that each step is understood by all, that they
turn to the proper page when the new test is to be begun, and that
they give instant obedience to his directions.
9. The directions for giving the tests should be followed literally.
Avoid all impromptu directions since such variations may modify
the results. Even though the directions are memorized, they should
always be read from the manual in giving a test.
10. All should start and stop together. If a child comes in late or
leaves theroomearly.or his work otherwise is interfered with, a note
of the fact should be put on his paper at the time.
11. Accurate timing of the results is of great importance. Use a
watch with a second hand. Have an assistant to act as timer if
convenient.
12. The children must be constantly watched for copying. Every
precaution against cheating should be taken, yet the manner of the
examiner should not be accusing or offensive to the self-respect of
the pupils.
C. DIRECTIONS FOR GIVING THE TESTS
Read: "Would you like to play a game?"
{For pupils who can read and wriU), "Before we begin I must ask
you a few questions. First, I want to know your name. Please
write your name at the upper right corner." (Hold up test blank
and [K}int.) (Pause.) "Have you all done that?" (Pause.) "That
The Chinese Non-Verbal Tests 71
(s fine! Now answer all the questions on the whole page." (Pause.)
"Who has not as yet finished?" (Pause.)
(// subjects cannot read or write at all, begin here.)
After all the test blanks are filled out, the examiner should say:
"Now I want to tell you something about the game. I am going to
ask you to do things for me. Some of them will be very very easy
and some will be hard. You will not be expected to do all of them,
but do the very best you can. You must listen carefully to what I
say or you will not know what to do. After I say, 'Go,' don't ask
any questions and don't look at anybody's paper but your own.
When 1 say, 'Stop,' you are to quit work at once, even if you have
not finished. If you finish before I say, 'Stop,' put your hands back
of your head."
Test I. Picture Completion
"Now turn the firet page like this." (Hold up the test blank and point.)
"Here," (pointing) "ia the first game. Have you all found it? That is fine."
(Pointing to it.) "Now look at the pictures at the top of the page" (pointing).
"There is something gone or missing from each of these pictures. What is the
matter with the hand? What is left out?" (Pause.) "Ves, one finger is gone.
Take your pencil and put in the linger." (Pause.) "What is thi: matter with
the fish? What is left out?" (Pause.) "Ves, that is right. The eye is gone.
Put in the eye." (Pause.) "What is the matter with the table?" (Pause.)
"That is right. One leg is gone. Draw the leg on." (Pause.) "Now listen!
There are other pictures on this page. None of them are finished. Everyone
has something gone or left out. I want you to find out what is gone in every
picture and then put it in. Ready. . . . Go!" (Time limit is three minutes.)
"Stop! Hands back of your heads!"
Test 3. Easy Learning
"Turn over the page and fold your book like this." (Show how to do it.^
"Here is the second game" (pointing). "Have you all found it? That is line.
Look at the three boxes at the top of the page. Now watch me." (Hold up the
test blank and point.) "The two marks in this box must always go together"
(painting to the first box at the top of the page). "The two marks in this box
must always go together" (painting to the second box at the top of the page),
"and the two marks in this box" (pointing to the third box at the top of the page)
"must always go together. Now you must put in all these boxes that have
only one mark" (pointing), "the other mark that belongs with that one. Do you
understand?" (In case the children do not understand, reread the directions be-
ginning at the "Some of the boxes in the first row have' already been filled in the
way they should be." "When I say Go," I want you to put in each box the mark
that belongs with the mark that is there. Ready. . . . Go!" (Time limit it
three minutes,) "Stop! Hands behind your heads!"
72 Non- Verbid tnUUigence Tests for Use in China
Test 3. Hakd Lbakminc
"Turn over your book and fold it like this." (Show how to do it.) "Here is
the third game. It is like the second one, only there are more boxet with more
kind) of marks to make. Have you all seen the boxes at the top of the page?"
(pointing and waiting a moment). "Now watch me. The two marka in this
box" (hold up the test blank and point to the first box at the top of the page),
"must always go together, the two marks in this box" (point to the second box at
the top of the page) . "must always go together, the two marks in this box" (point
to the third box at the top of the page) "must always go together, and so on"
Cpoint to all the rest of the boxes at the top of the page). "E>o you understand?
That is fine. Some of the boxes in the first row have already been filled in the
way they should be. When I say 'Go,' I want you to put in each bos the mark
that belongs with the mark which is there. [Uady. . . . Go!" (Time limit is
three minutes.) "Stop! Ha'nds back of your heads!"
Tkst 4. Absusdities
"Now turn over the page and fold your book like this." (Show how to do
it.) "Here are pictures. Every one of them has something wrong. I want
you to find out what is wrong and cross it out with your pencil. Look at the first
picture. What b wrong with the boy's face?" (Pause.) "Yes, the eye. Cross
out the eye, because it is wrong." (Malce a gesture to show how to make a
cross.) "What is wrong with the bird in the next picture?" (Pause. > "Yes,
the bird has two heads. Which one is wrong?" (Pause.) "That is r^t.
Cross out the upper head." (Pause.) "What is wrong with the third picture?"
(Pause.) "Yes, his foot istumedthe wrong way. Cross out the foot." (Pause.)
"Now listen! Mark the Other pictures on the whole page in the same way. In
each picture, cross out the one part that is wrong. Ready. . . . Go!" (Time
limit is three minutes.) "Stop! Hands back of your heads!"
Test 5. Mask Chbcking
"Turn over your book like this." (Show.) "Have you all seen the rows of
marks and the little boxes?" (Point.) "That is fine! 1 want you to find out
whether the marks in each row are the same. If they are the same you are to
put 'x' (make a gesture to show how the 'x' is made), in the little box
at the side., tf they are different you are to put in 'o' (make a gesture to show how
the 'o' is made) , "in the little box at the side. Look at the ones at the top of the
page. Are these first two the same? (pointing and pause). Yes, so the mark in
the box is 'x.' Look at the next ones (pointing), are they the same?" (pause).
"No, so the mark in the box is 'o.' The next ones are not the same, so the mark
is 'o.' In the next one which should we put in, 'x' or 'o'?" (when some child
gives 'x' say) "Yes, all of you put 'x' in the little box. In the next one which
should we put in?" (Pause.) "Yes, 'o' is right. Alt of you put 'o' in the little
box. Now when I say 'Go' you are to put the right mark in all the little boxes
all the way down the page, on one side, and then all the way down on the Other
side." (Pointing) "Ready. . . . Go!" (Time limit is three minutes) "Stop!
Hands back of your heads!"
The Chinese Non-Verbal Tests 73
(Collect the test booklets at once, not permitting any time for
further work.)
D. DIRECTIONS FOR SCORING THE TESTS
Keep the test papers for each group together and score test by
test in one whole set, rather than running through all the tests in
each paper.
Use keys and stencil for scoring.
Where accuracy is desired, all scoring should be checked by a
second scorer.
In scoring, mark the correct items by a check { V } and indicate an
error by o.
When an item evidently has been corrected by the pupti, the cor-
rection is the answer to be scored.
The score for each test should be entered in the upper right-hand
comer of the test paper, and encircled. When the scoring has been
checked, a check mark may be made beside the circle.
Test I
1. Score is number right.
2. Allow much awkwardness in drawing. Writing in name of
missing part, or other way of indicating it, receives credit, if idea is
clear.
3. Additional parts do not make items wrong, if the proper miss-
ing part is also inserted.
Tests 2 and 3
1. Score is number correct. Maximum score is 50.
2. Lay the transparent stencil on the paper. The correct sym-
bols will appear just below the child's symbols.
Test 4
1. Score is number right.
2. Any mark that clearly indicates correctly the absurd object
should be scored as correct.
74 Non- Verbal Intelligence Tests for Use in China
Test 5
1. Score is right minus wrong number (number of items checked
that are correctly checked minus number of items checked that
are wrong). Pay no attention to omissions.
2. If other clear indications are used, instead of crosses and zeros,
give credit.
3. If pupils give nothing but crosses or zeros the score of the test
is zero.
B. TREATMENT OP RESULTS
1 . When the scoring is finished the test papers should be arranged
or grouped according to the age or grade of the children.
2. Then the mean for each group is calculated by adding leather
all the scores of the individuals in that group. The mean so ob-
tained may be used to represent the attainment of the age or of
the class.
F. CAUTION
A caution should be urged against relying too exclusively on the
bald test scores as a basis for administrative action. These tests
when properly administered are fairly reliable as a measure of intel-
ligence. There is always the possibility, however, that the child for
some reason may have failed to do himself full justice in the test.
He may have been sick, or he may not have taken the testing se-
riously. There is also the possibility that the examiner, or scorer, has
made a statistical error. So the results of the tests should be inter-
preted in the light of all such supplementary information as may be
available. In the small number of cases where there is a clear dis-
agreement between the results of the tests and other data, such as
school marks, teachers' estimates, and so on, an alternative form of
the test may be repeated and the scores compared. It should be
especially pointed out that the tests are not a substitute for
common sense on the part of teacher or principal.
CHAPTER VIII
SUMMARY AND CONCLUSIONS
1 . Progressive Chinese educators who are planning to introduce
the measurement movement into China are confronted with the
problem of procuring suitable test material. China, with her dis-
tinctive civilization, with her numerous dialects and her lack of
universal education, encounters great difficulty in the application
of language tests. This study is an attempt to develop a non-verbal
scale which, because of the elimination of language, environmental
and educational factors, may be used either as an independent
measure of general intelligence or as a supplement to a language
test.
2. Instead of forming a purely Chinese test, it was decided to
select the most useful elements from the best known American non-
verbal tests which have already been standardized. The following
tests were chosen for experimentation: The Myers Mental Measure,
The Pressey Primer Scale, The Pintner Non-language Tests, The
Army Beta Examination, and The Dearborn Group Tests of Intelli-
gence (Examinations I, II, and III).
3. The selected tests were given in Public School No. 108 of the
Chinese section of New York City to 401 children of Chinese, Ital-
ian, and Hebrew descent. Most of the children were Italians.
Since the purpose of the study was to select the best test elements
and since it was not intended to derive norms, it made no difference
whether the subjects were Chinese or of any other nationality.
4. The criterion, after many trials, was decided to be aweighted
composite of age, teachers' estimates, school marks, school progress
and test scores. Each of these measures general intelligence, to
some degree, in a different way, so their combination should be
reliable. The criterion is the standard extensively used for the
selection of the best test elements from the five scales.
5. By the methods of correlation, rating, partial correlation, and
regression equation, the test elements of the five scales were checked
against the criterion to determine their validity. The foremost
76 NoH- Verbal Intelligence Tests for Use in Ckina
valid tests were selected to form the Chinese Non-verbal Intelli-
gence Examination. Those so chosen are Tests 2 and 3 of the
Pintner Non-language Tests, Testa 5 and 6 of Army Beta Examina-
tion, and Test 4 of the Pressey Primer Scale.
6. The tests thus selected were given to the children who the
year before had taken all of the five tests. The correlation between
these test scores and the final criterion was .8768, which was higher
than the correlation of any one of the original examinations with the
criterion .
7. Although the selected tests are the best among the five scales,
they cannot be applied to Chinese children as successfully as to
American children, because of the unfamiliarity of the Chinese chil-
dren with the objects shown or situations represented. Conse-
quently alternative forms were prepared which are international
in nature and are not influenced by schooling or civilization. The
alternative forms will also prevent coaching.
8. The Chinese Non-verbal Tests so constructed are applicable
to a number of children at one time in the elementary school. The
period of examination does not exceed thirty minutes, but this short
time will enable a teacher or school administrator to measure the
native ability of the pupils as an aid in classification, promotion,
provision for the backward, improvement in methods of teaching,
and vocational guidance.
9. The norms are to be established in China and the final stan-
dardization will take place there.
APPENDICES
Sample of Form A of the Chinese Non-Verbal
Intelligence Examination
H X C X M C X
V H X H
l^ c
n -E A 8 K
r V L + o
7 L t
S>(.t>*H a|TciTj. HarJ Lc>Ti>iN4
@
ifi
\
ir
ii«--v-
j + r--
Hv — nv a
ox on a
LVC LVL H
"'JJi 5l'2vtLi!i nvtLoia
■-i-rr u Inoiijvo iioiiv/ra
■Yoti niLJ-ircvJ — LiucvKQ
iVtm- Verbal Intelligence Tests for Use in China
B. Sample op Records Kept
The following is a sample from the original record book which is
now kept in the library of Teachers College, Columbia University.
All those who are interested in the full record may have access to
it by communication with the proper authorities.
Boys-
No
Ag.
Yr. Mo
Nation-
ality
Health
P^o
School
Marks
Teadien'
Estimates
Grade
1
3
3
4
5
6
7
|o. C.
iiS
9 —
Italian
Teeth
Ye>
CCB
III A
W. L.
"19
9 —
Chinese
Teeth
Ye*
BBB
III A
)o. M.
9 —
lUlian
Teeth
Ye<
BBB
III A
De.M.
121
9 —
lUlian
Tonsils
Yes
B +
ABA
riiA
Ai. N.
'"
9 —
Italian
Teeth
Yea
DDD
III A
Thorndike-McCall
Reading Scale
Credit Assigned to
T
Score
Reading
Age
R.Q.
Age
School
Marks
Teachers'
Estimates
School
8
9
10
It
13
«3
U
38,5
35
37
107
9o;o
89
104
106
89
7
7
7
8
7
2 3 3
3 3 3
3 3 3
4 3 4
5
5
5
5
5
Appendices
School Criterion
Pr««»y+Plntn«+
Finol Crilelion (3 X
Total
iXMyH.+axBeta+
Tests Total
Sch. Ciit. + >X
Durborn
Tttt TotBl)
'5
l6
17
18
a6
484
48
174
aS
58*
58
aoo
38
5'3
51
■ 86
31
389
39
171
33
337
34
134
Pressev Primer Scale
19
30
21
22
23
n
Total Score
Test!
Test 11
Test III
Test IV
Test II-IV
38
21
'7
So
21
31
18
20
41
81
34
19
ai
17
36
44
8
tl
14
II
33
74
ai
18
18
16
34
PiNTNER NON-LANGUACE TESTS
25
26
27
38
39
30
31
32
Test Total
Test I
Test II
Test III
Test IV
TestV
Test VI
IMII
93
J
35
38
14
4
73
lOI
I
38
33
18
6
5
71
80
4
27
27
"4
4
4
54
58
33
3Z
'4
44
13
"
"
°
10
'
'
°
Non-Verbal Intelligence Tests for Use in China
Myers Mental Measure
33
34
35
36
37
TMt Total
Test I
Test II
Teat in
Test IV
"3
,
3
7
3
39
3
'3
8
5
14
3
3
4
5
14
3
9
2
13
3
3
3
3
Army Beta Ex ah in
38
39
40
41
43
43
44
45
46
47
Tert Total
I
11
III
IV
V
VI
VII
IIMV
V-VI
IV-VI
SSH
7
6
ISK
"3
11
47
38
77H
9
6
2m
37
14
69
36
83^
5
8
3S«
38
13
3
75
39
70>i
8
9
8
17?^
"4
8
6
48
36
SSH
6
"
4
tt>H
17
la
°
50
39
Dearborn Group Tests, Series I
48
49
50
51
Grand Total
Exam. I
Exam. II
Exam. Ill
171
60
86
35
'79
78
59
43
183
58
89
34
147
53
8r
'3
150
51
74
35
Appendices
Dbasborn Examination i
s.
S3
S4
„
,7
58
m
So
61
6J
M
64
6s
66
OJ
..
«.
Tplal
.
I.
m
V
y.
™
«n
i>
<
»
n
nn
^.v
XV
x.n
1.111
«
I
.1
■I
I
4
;
2
I
.(
,'
•
Si
i
;
1
^
;
:
;
4
3
I
14
;:
Dearborn Exauination i
70
71
73
73
74
75
76
77
Total
I
II
III
IV
V
VI
vn
86
9^
15
ai
14
11
12
4
59>^
S}4
IS
>5
3
9
t
89><
9>i
15
24
II
9
13
8iK
9J<
13
24
II
14
11
74
lO
15
»5
10
S
"
5
Dearborn Examination 3
78
79
80
Si
83
Total
I
II
III
IV
35
14
I,
42
15
4
as
34
14
9
II
13
3
3
8
35
9
16
Non-Verbal Intelligence Tests for Use in China
Tests Cohbinations
83
84
85
86
Beu m-rv-v
VI Pressey II-IV
''B«talV-VI
PnMey II-IV
104
133
93
94
36
104
105
52
69
84
72
67
36
80
63
51
S3
TBSTS COUBIHATIONS
87
88
8»
Pintner 11-111 Beta VI
Dearborn I
1-VI
DearboTD I Vll-XV
87
14
as
84
14
39
63
'4
29
66
■ 2
33
8
14
■ , 3'
90
91
93
93
9i
95
Total
I
II
III
IV
V
166
50
48
33
•4
21
170
44
50
35
18
'3
"43
SO
49
15
8
31
116
23
49
H
13
20
116
49
3«
6
8
'5
BIBLIOGRAPHY
AvEKS, L. P. "The Binet-Simon Measuring Scale for Intelligence: Some Crit-
icisms and Su^eations." Psychdogical Clinic, Vol. V (1911), pp. 187-96,
Bdcbt, a. and Sivon, T. "Ledgveloppementdel'intelligencechezlesenfants."
VAnnle ptyekotopque, 14 (1908), pp. 1-94.
BiNET, A and StuoN, T. "L 'intelligence dea imbeciles." L'Annte psycho-
loeique. {1909), pp. r-47-
Chen, H. C. "Educational Research in China." Journal of Educational
Research {May. 1921), Vol. Ill, No. 5, p. 394.
Chbn, H-. C. and LiAO, C. S. Mental Tests. Commercial Press, Shanghai,
China (1933).
Dearborn, W. F. The Dearborn Group Tests 0/ Intelligence, j. B. Lippincott
Co. (1930).
Frbtwell, E. K. a Study in Educational Prognosis. Teachers College Con-
tributions to Education, No. 99 (1919).
GoDDABD, H. H. "The Binet-Simon Measuring Scale of Intell^nce Revised."
Training School BulUlin, Vol. VIII (1911), pp. 56-63.
Healv, W. and pEkNALD, G. M. 'Tests for Practical Mental Classifications."
Psychological Monographs, Vol. No. 3, 54 (1911), pp. 4-5-
Herring, John P. "Significance of Certain Elements in Intelligence Examina-
tion." Unpublished Ph.D. dissertation, Columbia University (1931).
Kb;.lbv, T. L. Educational Guidance. Teachers College Contributions to
Education, No. 71 (1914).
Kelley, T. L. "Table to Facilitate the Calculation of Partial Coefficient of
Correlation and Regression Equation." Bulletin of ike University of Texas
(1916), No. 37.
Knox, H. A. "A Scale Based on the Work at Ellis Island for Establishing
Mental Defects." Journal of the American Medical Association, Vol. LXII
(March 7, 1914), pp. 741-747-
KuHLUAN, F. "A Revision of the Binet-Simon System for Measuring the
Intelligence of Children." Journal of Psycho- Asthenics, Monograph Supple-
ment, No. I (1913)1 p. 14.
McCall, W. a. Ilow to Measure in Education. Macmillan Co. (i933).
Myers, Cabolikb E. and Garrv C. "A Group Intelligence Test." School
and Society (1919), Vol. 10, pp. 355-3^-
Peking Teachers College Weekly, No. 133 {Sept. 11, 1911, p. 3.)
PiNTNER, R. "A Non-language Group Intelligence Tests." Journal of Applied
Psychology, Vol, III {Sept,, 1919).
PiNTNBR, R. The Mental Survey. D. Appleton Co. (1918).
PiNTNEi, R. and Patterson, D. G. "The Binet Scale and the Deaf Child."
Journal of Educational PsychoUgy, Vol. VI (1915), pp, 303 ff ,
84 Bibliography
Pressev, S. L. and Pressey, L. W. "Cross-out Tests." Journal of Applied
Psychology, Vol. 3 (1919), pp. 143-150-
PvLE, W. H. "A Study o( the Mental and Physical Characteriatica of the
Chinese." School and Society, Vol. VIJI, No. 13Z (August 31, 1918), pp.
364-369.
Stbkn, W. The Psychological Methods 0/ Teatini Intelligence. Translated by
G. M. Whipple. Warwick and York, Baltimore (1914).
Teruan, Lewis M. The Measurement oj InuUitence. Riverside Tenbooks in
Education. Houghton Miffin Co. (191S),
Thokndike, E. L. "A Standard Group Examination of Intelligence Independent
of Language." Journal a} Applied Psychology, Vol. Ill, No. 1 (March, 1919),
pp. 13-32.
Thorndike, E. L. Mental and Social Measurements. Teachers College,
Columbia University (1919).
Walcott, G. D. "The Intelligence of Chinese Students." School and Society,
Vol. 11 (1930). pp. 474-480.
Walun, J. E. Experimental SHidies of Menial Defeaives: a Critique 0/ the
Binet-Simon Tests. Warwick and York, Baltimore (1913).
Ybrkes, R. M. "Psychological Examining in the United States Army."
Memoirs of the National Academy of Science, Vol. XV (1921).
Yoakum, C. S. and Yerkes, R. M. Army Menial Tests. Henry Holt Co.
(1920).
Yerkes, R. M., Bridges, J. W. and Hardwick, P. S. A PoiiU Scale for
Ue^nirini Mtnlal Ability. Warwick and York, Baltimore (1912).
illllilili ^-^^"l
3 tlOS DDt SET 181 -,^_q _ j2j(o
O.o
iiiiiiiliiiiiii 'I
3 blDS DOb 551 HI rO ' O,